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Preface 



SAC 0)8 is the Dfth in a series of annual workshops on Selected Areas in Cryp- 
tography. SAC ffi4 and SAC 0)6 were held at QueenH University in Kingston and 
SAC [65 and SAC 037 were held at Carleton University in Ottawa. The purpose 
of the workshop is to bring together researchers in cryptography to present new 
work on areas of current interest. It is our hope that focusing on selected topics 
will present a good opportunity for in-depth discussion in a relaxed atmosphere. 
The themes for the SAC ffi8 workshop were; 

• Design and Analysis of Symmetric Key Cryptosystems 

• ED cient Implementations of Cryptographic Systems 

• Cryptographic Solutions for Internet Security 

• Secure Wireless/Mobile Networks 

Of the 39 papers submitted to SAC [98, 26 were accepted and two related pa- 
pers were merged into one. There were also two invited presentations, one by 
Alfred Alenezes entitled DKey Agreement ProtocolsD and the other by Eli Biham 
entitled Dinitial Observations on Skipjack: Cryptanalysis of SkipJack-3XORD. 
There were 65 participants at the workshop. 

The Program Committee members for SAC 08 were Carlisle Adams, Tom 
Cusick, Howard Heys, Henk Meijer, Doug Stinson, StaDord Tavares, Serge Vau- 
denay, and Michael Wiener. We also thank the following persons who acted as 
reviewers for SAC D38: Zhi-Guo Chen, Mike Just, Liam Keliher, Alfred Menezes, 
Serge Mister, Phong Nguyen, David Pointcheval, Thomas Pornin, Guillaume 
Poupard, Yiannis Tsioumi, Anir Youssef, and Robert Zuccherato. 

This year, in addition to the Workshop Record distributed at the workshop, 
the papers presented at SAC 0)8 are published by Springer- Verlag in the Lecture 
Notes in Computer Science Series. Copies of the Springer Proceedings are being 
sent to all registrants. 

The organizers of SAC [98 are pleased to thank Entrust Technologies for their 
Dnancial support and Sheila Hutchison of the Department of Electrical and Com- 
puter Engineering at Queenlk University for administrative and secretarial help. 
Yifeng Shao put together the Workshop Record and provided invaluable assis- 
tance in the preparation of these Proceedings. We also thank Laurie Ricker who 
looked after registration. 
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Serge Vaudenay 
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Abstract. Recently, we showed how to strengthen block ciphers by 
decorrelation techniques. In particular, we proposed two practical block 
ciphers, one based on the GF(2")-arithmetics, the other based on the 
X mod p mod 2" primitive with a prime p = 2"(1 + 5). In this paper we 
show how to achieve similar decorrelation with a prime p = 2"(1 — 5). 
For this we have to change the choice of the norm in the decorrelation 
theory and replace the Lao norm by the 1/2 norm. We propose a new 
practical block cipher which is provably resistant against differential and 
linear cryptanalysis. 



At the STACS’98 conference, the author of the present paper presented the 
technique of decorrelation which enables to strengthen block ciphers in order to 
make them provably resistant against the basic differential and linear cryptanal- 
ysis this analysis which is based on Carter and Wegman’s paradigm 

of universal functions has been used with the Aoo-associated matrix norm 

in order to propose two new practical block cipher families which are prov- 
ably resistant against those cryptanalysis: COCONUT98 and PEANUT98. This 
technique has been shown to enable to propose real-life encryption algorithms as 
shown by the Advanced Encryption Standard submission Q and related imple- 
mentation evaluations on smart cards Q. In this paper we present some earlier 
results based on the L 2 norm in order to make a new practical block cipher 
PEANUT97| 

1 Basic Definitions 

We briefly recall the basic definitions used in the decorrelation theory. Firstly, 
let us recall the notion of d-wise distribution matrix associated to a random 
function. 

Definition 1 ( | ' Given a random function F from a given set A to a given 
set B and an integer d, we define the “d-wise distribution matrix” [F]‘^ of F as a 
A‘^ X B'^-matrix where the {x,y)-entry of [F]‘^ corresponding to the multi-points 
X = (a;i, . . . , Xd) £ A'^ and y = (j/i, . . . , yd) G B‘^ is defined as the probability 
that we have F{xi) = yi for i = 1, . . . ,d- 

^ A full paper version is available on the web site 

^ The decorrelation technique with the L 2 norm happens to be somewhat less easy 
than the Lao norm, which is why is has not been published so far. 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 1~^| 1999. 

© Springer-Verlag Berlin Heidelberg 1999 
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Secondly, we recall the definition of two matrix norms: the ioo-associated norm 
denoted |||.|||oo, and the L2-norm denoted ||. 112- 

Definition 2. Given a matrix A, we define 



\\A\\2= JY. {A,, yf 



( 1 ) 



x,y 



1 1 1^1 1 |c» — naax^^ I (2) 

V 

where the sums run over all the (x, y)-entries of the matrix 

Finally, here is the definition of the general d-wise decorrelation distance between 
two random functions. 



Definition 3 ( | ■ Given two random functions F and G from a given set 
A to a given set B, an integer d and a matrix norm ||.|| over the vector space 
, we call ||[F]'^— [G]'^|| the “d-wise decorrelation ||.|| -distance” between F 
and G. In addition, we call “d-wise decorrelation ||.||-&ias” of a random function 
(resp. permutation^ F its d-wise decorrelation \ \M-distance to a random function 
(resp. permutation) with a uniform distribution^ 

We consider block ciphers on a message-block space A 4 with a key represented 
by a random variable df as a random permutation Ck defined by K over Ad. 
Since the subscript K is useless in our context we omit it and consider the 
random variable C as a random permutation with a given distribution. Ideally, 
we consider the Perfect Cipher C* for which the distribution of C* is uniform over 
the set of the permutations over Af . Hence for any multi-point x = (a:i, . . . , Xd) 
with pairwise XiS and any multi-point y = (j/i, . . . , yd) with pairwise y^s we have 



[C*t 



Pr[C*{xi) = yi'i = I, ■ ■ - ,d] = 



1 

#M...{#A 4 -d+l)' 



We are interested in the decorrelation bias ||[C']'^ — [C*]‘^|| of a practical cipher 
C. 

We recall that |||.|||oo and ||.||2 are matrix norms {i.e. that the norm of any 
matrix-product A x i? is at most the product of the norms of A and B) which 
makes the decorrelation bias a friendly measurement as shown by the following 
Lemma. 

^ The strange |||.|||tx> notation used in comes from the fact that this norm is 
associated to the usual ||.||tx3 norm over the vectors defined by ||F||tx3 = max|W| by 

X 

|||A|||oo= max ||AF||oo. 

||V|U=1 



It is thus important to outline that we are considering a function or a permutation. 
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Lemma 4. Let ||.|| be a norm such that ||A x i?|| < ||A||.||i3|| for any matrix A 
and B. For any independent random ciphers denoted Ci, C2, C3, C4, C* (where 
C* is perfect), the following properties hold. 

||[Ci O C2]‘^ - [C*f\\ < \\[C,f - [C*]'"||.||[C2]'' - [C*f\\ (3) 

||[CioC2]''-[CioC3]'^||<||[Ci]'^-[Cl'^||.||[C’2]'^-[C3]'^|| (4) 

1 1 [Cl o C2]‘" - [C3 o C4]'^| I < 1 1 I ,| I - [C4]'"| I 

+ll[Ci]'^-[C3]'^||.||[C4]'^-[C*]'^|| (5) 

Those properties come from the easy facts [Ci oC'2]'^ = [C'2]'^ x [C'l]'^ and [C*Y ^ 

[CiY = [c*Y- 

Feistel Ciphers are defined over M. = Mq for a given group A4q {e.g. Mo = 
Z2"^) by round functions Fi, . . .,FrOn Mq. We let C = 'F{Fi , . . . , Ff) denote the 
cipher defined by C{x\x'~) = where we iteratively compute a sequence 

{x\,x\) such that 

o;q = x^ and Xq 
x\ = x(_^ and x( 
y^ = x( and j/’' 

(see Feistel Q). 

To illustrate the problem, we stress out that perfect decorrelation (he. decor- 
relation bias of zero) is achievable on a finite field (no matter which norm we 
take). For instance, a random {d— l)-degreed polynomial with a uniform distri- 
bution is a perfectly d-wise decorrelated function. A random affine permutation 
with a uniform distribution is a perfectly pairwise decorrelated permutation. 
(Perfect decorrelation of higher degree is much more complicated.) Finite field 
arithmetic is however cumbersome in software for the traditional characteristic 
two. This is why we studied decorrelation biases. 

2 Previous Security Results 

Decorrelation enables to quantify the security of imperfectly decorrelated ci- 
phers. Here we consider the security in the Luby-Rackoff model Q. We consider 
opponents as Turing machines which have a limited access to an encryption ora- 
cle device and whose aim is to distinguish whether the device implements a given 
practical cipher Ci = C or a given cipher C 2 which is usually C 2 = C* . When 
fed with an oracle c, the Turing machine T'^ returns either 0 or 1. If we want to 
distinguish a random cipher C from C*, we let p (resp. p*) denote Pr[T‘^ = 1] 
(resp. Pr[T*^ = 1]) where the probability is over the distribution of the random 
tape of T and the distribution of the cipher. We say the attack is successful if 
\p — p* \ is large. On the other hand, we say that the cipher C resists against the 
attack if we have \p — p*\ < e for some small e. This model is quite powerful, 
because if we prove that a cipher C cannot be distinguished from the Perfect 
Cipher C* , then any attempt to decrypt a ciphertext provided by C will also be 



= X 



= X 



= x: 



i-l 

i 
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applicable to the cipher C* for which we know the security. (For more motivation 
on this security model, see Luby-Rackoff fl.) 

Inspired by Biham and Shamir’s attack ^ we call differential distinguisher 
with the (fixed) characteristic (a, b) and complexity n the following algorithm: 

Input: a cipher c, a complexity n, a characteristic (a, b) 

1. for i from 1 to n do 

(a) pick uniformly a random X and query for c(X) and c{X 0 a) 

(b) if c(X © a) = c(X) © b, stop and output 1 

2. output 0 

Similarly, inspired by Matsui’s attack ^ we call linear disti^uisher with the 
characteristic (a, b) and complexity n the following algorithm^ 

Input: a cipher c, a complexity n, a characteristic (a, b), a set A 

1. initialize the counter value t to zero 

2. for i from 1 to n do 

(a) pick a random X with a uniform distribution and query for c{X) 

(b) if X ■ a = c{X) ■ b, increment the counter t 

3. if t € A, output 1, otherwise output 0 



Both linear and differential distinguishers are particular cases of iterative distin- 
guisher attacks (see |3). 



Theorem 5 ( | ■ | ). Let C be a cipher on the space M. = Z™, let C* be the 
Perfect Cipher, and let e = |||[C]^— [C'*]^|||oo. For any differential distinguisher 
between C and the Perfect Cipher C* with complexity n, the advantage \p — p*\ 
is at most + ne. Similarly, for any linear distinguisher, the advantage is 

such that 



lim 

I— »-+oo 



\P-P* 






< 9.3 





This theorem means that C is immune against any differential or linear dis- 
tinguisher if |||[C]^ — [C'*]^|||oo ~ 2“"*. In this paper, we show we can obtain 
similar results with the ^ 2 -decorrelation and that we can use them for an efficient 
real-life cipher. 



3 Security by ^ 2 -Decorrelation 

It is well known that differential and linear cryptanalysis with characteristic (a, b) 
respectively depend on the following measurements. If C is a random cipher on 
7Aff where © denotes the group operation (the bitwise XOR) and • denotes the 
dot product (the parity of the bitwise and), we denote 

® For differential and linear cryptanalysis, we assume that the message space M is 
so that the addition + is the bitwise exclusive or and the dot product • is the parity 
of the bitwise and. 
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EDP‘^(a,h)= _E ^Pr[C(Xe a) = C(X) 0 6]^ = 2”"“ ^ [C]l^^ 

xi@X2=a 
VI ©^2=^ 

ELP‘^(a,h) = E ^^2Pr[X • a = C(X) • 6] - ^ = 1 - 2^"^"“ ^ [C]l^^ 

xi-a=yi-b 

X2-a^V2-b 

xi¥^^2<yi^V2 

where x = (xi,X 2 ) and y = (t/i,?/ 2 ), and X is uniformly distributed! In 
Theorem Jcomes from the upper bounds 

|EDP^(a, b) - EDP^* (a, 6) I < 1 1 1 [C]2 _ [C*]2| | ^ 

|ELP^(a, b) - ELP^* {a,b)\<2\\\[Cf- [C*]"| | U- 

The same inequalities hold with the L 2 norm. (These are the consequence of 
Cauchy-Schwarz Inequality.) We can thus adapt Theorem H^ith the L 2 bounds 
without any more argument. 

Theorem 6. Theorem^^remains valid if we replace |||.|||oo norm by the ||.||2 
norm. 

This means that if e = 1 1 [C]^— I 2 is small {i.e. if e < 2“"*), the complexity of 
any basic differential or linear cryptanalysis is close to 2"*, thus no more efficient 
than exhaustive search. 

In the following sections we show how to construct a practical cipher with 
a relatively small ||[C]^ — [C'*]^|| 2 . For this we first study how to bound the 
decorrelation L 2 -bias of a Feistel Cipher from the decorrelation of its round 
functions. Then we construct round functions with relatively small decorrelation 
^ 2 -bias and a corresponding dedicated cipher. 

4 ^ 2 -Decorrelation of Feistel Ciphers 

Here we show how to measure the decorrelation ^ 2 -bias of a Feistel cipher from 
the decorrelation of its round functions. We first study the case of a 2-round 
Feistel Cipher. 

Lemma 7. Let A4 q be a group and let Ai = A4 q. Let Fi, F 2 , Ff and Ff be 
four independent random functions on A4 q where Ff and Ff have a uniform 
distribution. If we have ||[Fi]'^ — [F '*]‘^||2 < e then we have 

\\[F{F,,F2)]^-mFf,Ff)]% < eVe^ + 2Pd 
where Pd is the number of partitions 0 / {1, . . . , d}. 

® Those notations are inspired from Matsui’s |. Actually, Matsui defined DP and LP 
and we use here their expected values over the distribution of the cipher in order to 
measure the average complexity of the attacks. 
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Proof. Let x = (xi, . . . , xj) (resp. y = (j/i, . . . , yd))he a multi-point with Xi = 
{x\, xl) (resp. yt = (y(, y[))- We recall that the relation yi = 'P{gi, g2){xi) means 
that yl = x\ + gi{xl) and y\ = xl + 52 (y[)- We thus have 

Pr[^{Gi,G2){xi) = y,-,i] 

= Pr[Gi(4) = yl - x\-, i] Pr[G 2 (y[) =y\~ i]- 

The 1-1 relation between {x\ x'" , y\ y'" ) and {x^,y^ — x^,y'",y^ — x'") is an im- 
portant point. In the following, we let {t, u, v, w) denote this family. Let us write 
the previous equation 



Pr [x 1-^ y] = Pr[t 1— > u] Pr[u 1— > w]. 
G1G2 Gi G2 



Let A Pr denotes Pr^ — Pri?. with obvious notations. We have 

A Pr[x I— > y] = Z\ Pr[t 1— > u]Z\ Pr[u 1— > w] -b Pr[t 1— > u]Z\ Pr[u 1— > w] 
-bPr[u I— > w]AVr\t ^ u]. 



Now we have 



||[ W ,^ 2 )]"- [ W *,^ 2*)]"||2 = E y ] 



We note that 

El Pr[i u]Z\ Pr[t I— > u] = 0 

t,U ^ 

(and a similar property for Pr2), thus we have 

m{F,,F2)]‘^ - mFf,Ff)]Y2 = elel + e\Y.Ui[v ^ w\ 

V,W ^ ^ 

t,u ^ 1 

where = \ \[Fj]‘^ - [F*]‘^\\l Hence 

mF,,F2)]‘^-[F{Ff,Ff)]‘^\\l < e4 + 2e^E 



For any partition V = {Oi, . . . , Ok} of { 1 , ... , d} into k parts, let 
Mv = {ti = tj^3k i,jeOk)}. 

We have 

EfPr[i^w]) = E E Efc[^^ 



t,U 



V into U 

k parts 
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We have u-terms for which the probability is not zero. Namely it is 1/M^. 
The number of t-terms which correspond to this partition is M{M — 1) ... {M — 
fc + 1) thus 




2 

= E 

'P into 
k parts 



W 



which is less than Pd. 



□ 



In order to measure the decorrelation distance between a 2-round Feistel 
Cipher and the Perfect Cipher, we thus have to study the case of a truly random 
2-round Feistel Cipher. 

Lemma 8. Let A4 q be a group and let A4 = A4 q. Let F* and be two inde- 
pendent random functions on A4 q with a uniform distribution and let C* be the 
Perfect Cipher on M. . We have 



\\[F{Ff,Ff)]‘^-[C*]% < VPdiPd - 1) 

where Pd is the number of partitions o/ {1, . . . , d}. 

Proof. With obvious notations we have 



\\[F{Ff,Ff)]‘^-[C* 



= E 



Pr — Pr 

■?(F*,F*) C* 



y]- 



The sums v] XI P'^'?(f*,f*) Pi'c* [ 2 ; y] are equal to Pd. (We 

observe it by fixing the partition associated to x and making the sum over all 
ys.) For the remaining sum, we use same ideas as in the previous proof: 




u] Pr[u I— > w] 

^2 



2 



which is less than Pj. □ 

LemmaHmay look useless because the decorrelation bias of is greater than one 
(so we cannot consider product cipher and get efficient bounds) . We can however 
use it to study the case of a 4-round Feistel Cipher. From LemmaHand Lemma 
Jand from Equation Q we obtain the following Lemma in a straightforward 
way. 

Lemma 9. Let Mq be a group and let M. = Mq. Let Pi, ... , P4, P*, . . . , P| 
be eight independent random functions on Mo where the F*s have a uniform 
distribution. If we have ||[Pi]'^ — < e < then we have 

||[P(Pi, P2, P3, P4)]'' - [I'iFf, P*, P*, P4*)]'^||2 < 2 V 2 {Pd)h 



where Pd is the number of partitions of {1, ... ,d}. 
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It thus remains to study the decorrelation distance between a truly random 
4-round Feistel Cipher and the Perfect Cipher: once we know that 

we obtain from Equation Q that 

\\[F{F,, . ..,F4r)]‘^ - [F{F *, . . .,F4;)]'^||2 < (2V2{Pd)h + Udy 

where e = max^ ||[Ei]'^ — [E '*]'^||2 < Unfortunately, the problem of obtaining 
a general result on the d-wise decorrelation of a truly random 4-round Feistel 
Cipher is still open| In the next section we propose a construction in the d = 2 
case for which we can evaluate the decorrelation. 



5 A Dedicated Construction 



In a general finite field GF(( 7 ), an obvious way to construct pairwise decorrelated 
functions (resp. permutations) consists of taking 

F{x) = a.x -f b 

where {a,b) is a random pair uniformly distributed in GF(( 7 )^ (resp. GF(( 7 )* x 
GF{q)). Unfortunately, the traditional message space Z™ requires that we use 
finite fields of characteristic two. If we aim to implement a cipher in software on a 
modern microprocessor, it looks cumbersome to implement a poor characteristic- 
two multiplication since there already is a built-in integer multiplication. For this 
reason we can think of the 

F{x) = {{ax -I- b) mod p) mod 2~ 



imperfectly decorrelated function to be inserted at the input of each round func- 
tion of a Feistel Cipher, where p is a prime close to 2^. 

In the (m, r, d,p)-PEANUT Cipher Family is defined to be the set of all 
r-round Feistel Ciphers over Z™ in which all round functions can be written 



F{x) =9 i'^ki.x'^ * 
\i=l 



mod p mod 2 2 



where (fci, . . . , kd) is an (independent) round key which is uniformly distributed 
in {0,...,2‘2‘ — 1}'^, p is a prime, and p is a permutation. For p > 2^, F 
has a friendly d-wise decorrelation |||.|||oo-bias which is roughly 2d5 when p = 
(1 -1- <5)2'2'. For p <2~ , the 1 1 1 . 1 1 1 00 -decorrelation is poor for d > 2. For instance, 
in the case d = 2, for a; = (0,p) we have 



E 

y=(yi,y2) 



Pr 


g{k 2 mod p) = yi 


— Pr 


>*(«) = dll 




g{k 2 mod p) = 1/2 




F*{p) = P 2 J 



= 2 - 2 ^-^ + 6 . 



^ This problem has been solved in with the |||.|||tx> norm. This is why the L2 

norm looks less friendly. 
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Hence |||[^’]^ — [^’*]^|||oo ~ 2. The p < 2^ case can however be studied with 
the L 2 norm. In the following, we consider a PEANUT Cipher construction with 
d — 2 and p < 2^ . 

Lemma 10. Let A and B be two independent random variables with a uniform 
distribution over {0, . . ■,2'^ — 1}. We let F{x) = Ax + B modp where p = 
(1 — (5)2^ is a prime for 1/14 > <5 > 0. Let F* be a random function uniformly 
distributed over the same set. We have 



\\[Ff - [F*]‘^\\2 < 2V^. 

Proof. We let IV = 2 ^ . We want to upper bound the sum 

E ([elb-riD^ 

x=(xi , X 2 ) 

y=(yi , 1 / 2 ) 

Tablejshows all the possible cases for x and y, the number of times they occur 
and an upper bound for the probability difference. 

For instance if = X 2 ^ 0 (mod p) and yi — y 2 < P, we have 

+ B mod p = yi] 



and = N We let a (resp. b, c, d) be the number of {A mod 

p, B mod p) pairs such that Axi + B mod p = yi and 

— A mod p < 6N and B mod p < 6N (resp. 

— A mod p < 5N and B mod p > 5N , 

— A mod p > 5N and B mod p < 6N, 

— A mod p > 6N and B mod p > 6N). 

We have a + b = SN, a + c = SN and a+ b + c + d = p. Hence 






2 

{xi,X2),{yi,V2) 



4lQ T 2b T 2c T d 

JP 



N + 5N + a 

JP 



Since we have 0 < « < SN, we have 



( 1 +^)^-'<[^] u .,. 2 ),(.... 2)<(1 + 2 < 5 ) A ^-^ 

The Xi ^ X 2 case is split into four cases which depend on Xi and X 2 . The last 
case is yi > p or y 2 > p for which [F]1^y = 0. The three other cases correspond 
to cases on {A mod p, B mod p) with yi = Axi + B mod p. 

Case 1: A mod p < SN, B mod p < SN. We have [F]1 y = 4N~^ . 

Case 3: A mod p > SN, B mod p > SN. We have [F]‘^^y = N~^ . 

Case 2: other values. We have [F]‘^^y = 2N~^. 

We can now upper bound the whole sum. We obtain that the decorrelation 
bias ||[T]^ — [T *]^||2 is less than 



7S + US^-6L-iS>-2dd+i± 



8i- + i4 



7V2 



which is less than 8S when S < 1/14. 



□ 
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case X case y 


num. X num. y \[P']x.y ~ ^^*'\x.y\ ^ 


Vl = V2 < SN 

Xl = X2 = 0 Vl = V2 > 

other cases 


Jn ^ N~^ 

2 SN N~^ 

- 2SN 0 


Vl = V2 > 

Xl = X2 ^ 0 VI = V2 < 

Vl ^ V2 


SN N~-^ 

N - 2 (1 - 5)N 2SN~^ 

- N 0 


VI = V2 < SN 

Xl X2, Xl = X2 = 0 1/1 = 1/2 > (1 - 

other Vl = 1/2 
Vl ^ V2 


SN 2N~^ - N~^ 

2 SN N~^ 

(1 - 2S)N N~^ - N~^ 

n“^ - N iV“2 


Xl ^ X2, Xl = X2 ^ 0 yi = y2 < (1 - 
other cases 


2SN-2 (1 - S)N (1 + 2S)N~-^ - N~'^ 

At2 _ (1 _ s)N N~“^ 


case 1 

Xl ^ X 2 case 2 

case 3 

Vl or y2 > (1 — 


S'^N'^ 3N~'^ 

N^-(1+2S)N 2(1-2S)SN^ N~^ 

(1 - 25)2at2 0 

(2 - S)SN‘^ N~“^ 



Table 1. Decorrelation of A.x + B mod (1 — S)N 



Lemma 11. Let M — Z™. Let F*, . . . , be four independent random June- 

m 

tions on ZJ" with a uniform distribution and let C* be the Perfect Cipher on 
M. We have 






< \ 2 . 2 - 



-4.2- 



Proof. For each input pair x = (xi, X 2 ) we have Xi = (x-, x[). Similarly, for each 
output pair y = (yi,j/ 2 ) we have yt = {y\,yl). All (x, y) pairs can be split into 
10 cases: 



1. yl ^ yl, x\ ^ x5 

2. y\ ^ yl, = xl, x{ © x' ^ {0, y{ © yl} 

3- yl ^ yl, x\ = xl, x{ ®xl = yl®yl 

4. yl + yl, a;i = x^ 

5- yl = yl, y\ ^ yl, xl®xl<^ {0, y[ © yl} 

6- yl = yl, y[ yl, x[ © x^ = y[ © yl 

7. yl = yl, y\ yl, xl = xl, x{ ^ xl 

8- yl = yl, y[ -h yl, = xi 

9- yl = yl, y[ = yl, xi = X 2 
10. yi = y2, Xl ^ X2 



Each case requires a dedicated study. 
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We consider a truly random 2r-round Feistel cipher for r > 1 denoted C = 
. . . , We have 



A\. = 


1 


(1- 








if 


case 


1 


7V^(7V^-1) 


TTzrj 






Al = 


1 


(1- 


1 


J_ 1 


1 ) 


if 


case 


2 


7V^(7V^-1) 


TyT^ 


Nr ^ 






Al = 


1 


(1+ 


1 


1 


2 , 1 ^ 


if 


case 


Q 


7V^(7V^~1) 


TVr^ 




AT'- ^ AT^'— i 


0 


0 












if 


case 


4 


Al = 


1 


(1- 


1 


J_ + 

Nr ^ 


1 t 


if 


case 


5 






AT^'— 1 ) 


Al = 


1 


(1+ 


1 


1 


2 1 1 ^ 


if 


case 


6 




AT'— ^ 




jyr T- nZZ^, 


K = 


1 


(1- 


1 t 






if 


case 


7 


TV^(Ar^-l) 


) 






0 












if 


case 


8 


AT-2 












if 


case 


9 


0 












if 


case 
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where N = 2^ . We prove this by an easy induction. Namely we show that 







r— 1 




and 




(N-l)(N-2) , 

m ^ 

(N-l)(N-2) 

W 

1 - ^ 

^ N 



J_ N-1 N-1 

N ~N^ ~N^ 

N-1 I J_ N-1 
~N^ ^ N 

J_ 

N 




For instance, if r = 1 and y\ ^ y 2 , x\ = X 2 , x\ ® X 2 = y\ ® y 2 , the probability 
corresponds to the fact that F*{xl) XORs the good value on both x\ and 
(with probability l/N) and that both and F^^y^) XOR the good values 

on xl and X 2 respectively (with probability 1/fV^). 

To prove the matrix relations, we let x denote the input of C, y denote the 
output of the first two rounds and 2 denote the output. We have 



^ = f{f*,f;, f;, . . . , fi){x) = f{f;, . . . , Fi){y) 

{y^,y^) = F{F*,F^){x). 

For instance, transition from case 2 {zl ^ Z 2 , y\ = j/ 2 > Vi ® V 2 ^ ^1 ® ^ 2 }) to 

case 1 {zl ^ Z 2 , xl ^ x^) corresponds to the N{N — 2) possibilities for y[ and 
j /2 (all but for y{ = j /2 or y[ (B y 2 = zl ® z^), all with probability 1/fV^ (since 
Fl(xl) and Fl(xl) are independent), mixed with the N possibilities for = y^, 
all with probability 1/iV^, which gives {N — 2)/fV^. This means A^. includes a 
term which represents all possible ys coming from case 2. 

With this result we can compute the pairwise decorrelation bias of C. We 
have 



1 1 [C]2 _ |2 ^ (^^1)2 ^ n2{AAlf + n3(ZlR?)2 

+n5(Z\A®)^ + ne{AAf)‘^ + n-j{AAl)‘^ 
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where AA^. = A). — and rii is the number of (a;, y) pairs in case i. We 

obtain 

2AT4-2r _ 67V2-2r _ 47Vl-2r ^4-4r + fV2-4r 

(iV- 1)2 ■ 



For r = 2 (four rounds), this is less than 2N 2 -|- 4N 



□ 



We can now define the PEANUT97 Cipher construction. It consists of a 
(m, 4r, 2,p)-PEANUT Cipher, i.e. a 4r-round Feistel Cipher on m-bit message 
blocks which is characterized by some prime p < 2^. Each round function of 
the cipher must be with the form 



Fi{x) = gi{K 2 i-iX + K 2 i mod p) 

where (ATi, . . . , ATg^) is uniformly distributed in and Qi is a (possibly inde- 

pendently keyed) permutation on the ^-bit strings. The lemmataH^Jand^l 
proves the following theorem. 

Theorem 12. Let C he a {m,4:r,2,p)-PEANUT97 Cipher such that p = (1 — 
S)2^ with 0 < (5 < Let C* be the Perfect Cipher. We have 

\\[Cf - [C*]^\\2 < (l&'M + \! 2.2-^^ + F2~^'^ . 

For instance, with m = 64 and p = 2^2 _ 5^ -^^0 obtain ||[C']^ — [C*]^||2 < 2“^°’'. 
Thus for r = 7 we have ||[C']^ — [C'*]^||2 < 2“’’°. Theorem^thus shows that 
|p — p*| < 0.1 for any differential distinguisher with complexity n < 2®° and any 
linear distinguisher with complexity n < 2'^'^. 

This PEANUT97 construction has been tested on a Pentium in assembly 
code. A 28-round 64-bit encryption required less than 790 clock cycles, which 
yields an encryption rate of 23Mbps working at 300MHz. The table below com- 
pares it with the PEANUT98 construction, for which the 1 1 1 . 1 1 1 00-decorrelation 
theory enables to decrease the number of rounds (see ^J) and the DFC AES 
candidate which is a PEANUT98 128-bit block cipher (see Q) . All ciphers have 
similar security against differential and linear cryptanalysis. We remark that one 
PEANUT97 is much faster than the other rounds, so PEANUT97 may be faster 
than PEANUT98 if we can get tighter bounds in order to decrease the number 
of rounds. 



cipher 


PEANUT97 


PEANUT98 


DFC 


block length 


64 


64 


128 


number of rounds 


28 


9 


8 


cycles/encryption 


788 


396 


754 


cycles/round 


28 


44 


94 


enc. rate at 300MHz 


23Mbps 


46Mbps 


49Mbps 


pairwise decorrelation 


(T2) 


2-'® (IIMIloo) 2- 


(IIMII- 


reference 


^ 3 ’ ^4ere 


^^9 
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6 Conclusion 

We have shown how to use the ax + b mod p pairwise decorrelation primitive 
for p < 2~ . It requires that we use the L 2 norm in the decorrelation technique, 
which leads to more complicated computations than for the IH-IHoo norm. 

When used at the input of Feistel Ciphers, this primitive enables to protect it 
against differential and linear cryptanalysis. For 64-bit message block, it however 
requires at least 28 rounds. 

Some extensions of the 1 1 1 . 1 1 1 00 -decorrelation results to the ^ 2 -decorrelation is 
still open: it is not clear how to state results with higher degrees of decorrelation 
{d > 2) and how to prove the security of decorrelated ciphers against general 
iterated attacks as in ^ 3 . 
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Abstract. This paper discusses a method of enhancing the security of 
block ciphers which use s-boxes, a group which includes the ciphers DES, 
CAST-128, and Blowfish. We focus on CAST-128 and consider Blowfish; 
Biham and Biryukov [2] have made some similar proposals for DES. 
The method discussed uses bits of the primary key to directly manipu- 
late the s-boxes in such a way that their contents are changed but their 
cryptographic properties are preserved. Such a strategy appears to sig- 
nificantly strengthen the cipher against certain attacks, at the expense of 
a relatively modest one-time computational procedure during the set-up 
phase. Thus, a stronger cipher with identical encryption / decryption 
performance characteristics may be constructed with little additional 
overhead or computational complexity. 



1 Introduction 

Both carefully-constructed and randomly-generated s-boxes have a place in sym- 
metric cipher design. Typically, a given cipher will use one or the other para- 
digm in its encryption “engine” . This paper suggests that a mixture of the two 
paradigms may yield beneficial results in some environments. In our examples, we 
use the four 8 x 32 s-boxes which the Blowfish and CAST-128 ciphers employ, 
but variations of this technique could be applied to any cipher using s-boxes, 
whatever their number and sizes. 

We propose using strong s-boxes and applying key-dependent operations to 
them at the time of key scheduling, before the actual encryption begins. The goal 
is to get the benefits of strong s-boxes (as in CAST-128) and of key-dependent 
s-boxes (as in Blowfish) without the drawbacks of either. 

The technique can be powerful. If a basic cipher can be broken in a second 
by exhaustive search over the key space and if key-dependent operations on the 
s-boxes add 32 bits to the effective key length, then breaking the improved cipher 
by brute force takes 2^^ seconds (just over a century). If these operations add 
64 effective bits, then it would take 2^^ centuries to break the improved cipher. 

Key-dependent operations on s-boxes can use large numbers of bits. For 8 x 32 
s-boxes, XORing constants into the inputs and outputs can use 40 bits per s-box. 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 15-^^1999. 

© Springer-Verlag Berlin Heidelberg 1999 
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Permuting the inputs and outputs can use log 2 ( 8 !) + log2(32!) > 130 bits per 
s-box. A cipher with four such s-boxes could use 680 bits of additional key with 
these operations (although it is recognized that this will not necessarily be the 
increase in the effective key length). 

We start with the CAST-128 cipher and propose using between 148 and 
256 additional key bits for s-box transformations. The increase in effective key 
length (although difficult to compute precisely) is likely to be considerably lower 
than this, but the transformations still appear to be worthwhile in at least some 
applications. In particular, the cost is moderate and the resulting cipher appears 
to be more resistant to attacks that rely upon knowledge of the s-box contents. 

It is important to note that the technique is inherently efficient in one sense: 
all the s-box transformations are done at set-up time. Thus, there is no increase 
in the per-round or per-block encryption time of the strengthened cipher. 

2 Considerations 

2.1 The Extra Key Bits 

The additional bits required for this proposal may come from one of two sources: 
derived key bits or primary key bits. As an example of the former, CAST-128 
[1] expands the 128-bit key to 1024 bits but does not use all of them. The actual 
encryption uses 37 bits per round (592 in the full 16 rounds) so that 432 bits 
are generated by the key scheduling algorithm but are unused by the cipher. 
These currently unused bits may be good candidates for the bits needed for 
s-box manipulation. 

Alternatively, additional primary key bits may be used for the proposal in this 
paper. This has the advantage of increasing the key space for exhaustive search 
attacks, at the cost of increased key storage and bandwidth requirements. Note, 
however, that in some environments the bandwidth required for key transfer or 
key agreement protocols need not increase. In one common use of symmetric 
ciphers, session keys are transmitted using a public key method such as RSA [5] 
or Difhe-Hellman [3] . Public key algorithms use large numbers of bits so that to 
transmit a 128-bit session key, you may need to encrypt, transmit and decrypt a 
full public-key block of 1024 bits or more. In such a case, any key up to several 
hundred bits can be transmitted with no additional cost compared with a 128-bit 
key. 

Using derived key bits has no impact on primary key size, but depends upon a 
key scheduling algorithm that generates extra (i.e., currently unused) bits. Using 
primary key bits places no such requirement on the key scheduling algorithm, but 
has storage and bandwidth implications, and may show some susceptibility to 
chosen-key-type attacks (since the two pieces of the primary key are “separable” 
in some sense). 

2.2 CAST’s Strong S-Boxes 

The CAST design procedure uses fixed s-boxes in the construction of each spe- 
cific CAST cipher. This allows implementers to build strong s-boxes, using bent 
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Boolean functions for the columns and choosing combinations of columns for 
high levels of s-box nonlinearity and for other desirable properties. Details can 
be found in Mister and Adams [4] . 

For example, the CAST-128 s-boxes appear to be strong but they are fixed 
and publicly-known. This may allow some theoretical attacks (e.g., linear or 
differential cryptanalysis) to be mounted against a given CAST cipher which 
uses these s-boxes, although the computational cost of these attacks can be 
made to be infeasibly high with a suitable choice in the number of rounds. 



2.3 Blowfish’s Key-Dependent S-Boxes 

Blowfish generates key-dependent s-boxes at cipher set-up time. This means the 
attacker cannot know the s-boxes, short of breaking the algorithm that generates 
them. 

There are at least two disadvantages, which can to some extent be traded off 
against each other. One is that generating the s-boxes has a cost. The other is 
that the generated s-boxes are not optimized and may even be weak. 

A Blowfish-like cipher might, with some increase in set-up cost, avoid spe- 
cific weaknesses in its s-boxes. Schneier discusses checking for identical rows in 
Blowfish [6, page 339] but considers this unnecessary. In general, it is clearly 
possible to add checks which avoid weaknesses in randomly-generated s-boxes 
for Blowfish-like ciphers, but it is not clear whether or when this is worth doing. 

On the other hand, generating cryptographically strong s-boxes at run time 
in a Blowfish-like cipher is impractical, at least in software. Mister and Adams 
[4] report using 15 to 30 days of Pentium time to generate one 8 x 32 s-box 
suitable for CAST-128, after considerable work to produce efficient code. This is 
several orders of magnitude too slow for a run-time operation, even for one used 
only at set-up time and not in the actual cipher. 



2.4 Resistance to Attack 

Schneier [6, p.298] summarizes the usefulness of randomly-generated s-boxes 
with respect to the most powerful statistical attacks currently known in his 
introduction to the Biham and Biryukov work on DES with permuted s-boxes 
[ 2 ]: 

“Linear and differential cryptanalysis work only if the analyst knows the 
composition of the s-boxes. If the s-boxes are key-dependent and chosen by a 
cryptographically strong method, then linear and differential cryptanalysis are 
much more difficult. Remember, though, that randomly-generated s-boxes have 
very poor differential and linear characteristics, even if they are secret.” 

This inherent dilemma leads to the proposal presented in this paper: we 
suggest s-boxes that are key-dependent but are not randomly generated. 
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3 The Proposal 

Start with carefully-prepared strong s-boxes, such as those described for CAST 
in Mister and Adams [4] and apply key-dependent operations to them before 
use. The goal is to introduce additional entropy so that attacks which depend 
on knowledge of the s-boxes become impractical, without changing the properties 
which make the s-boxes strong. 

We apply the operations before encryption begins and use the modified s- 
boxes for the actual encryption, so the overhead is exclusively in the set-up 
phase. There is no increase in the per-block encryption cost. 

It can be shown that important properties of strong s-boxes are preserved 
under carefully-chosen key-dependent operations. Given this, it is possible to 
prepare strong s-boxes off-line (as in CAST-128) and manipulate them at ci- 
pher set-up time to get provably strong key-dependent s-boxes (in contrast with 
ciphers such as Blowfish). 

The question is what operations are suitable; that is, what operations are key- 
dependent, reasonably efficient, and guaranteed not to destroy the cryptographic 
properties of a strong s-box. 

The first two requirements can be met relatively easily; simultaneously achiev- 
ing the third is somewhat more difficult. However, several classes of operations 
may be used. 

— Permuting s-box columns 

• this has the effect of permuting output bits. 

— Adding affine functions to s-box columns 

• this has the effect of complementing output bits, possibly depending 
upon the values of other output bits. 

— Permuting s-box inputs 

• this has the effect of producing certain s-box row permutations. 

— Adding affine functions to s-box inputs 

• this has the effect of producing other s-box row permutations, possibly 
depending upon the values of other input bits. 

In general, then, the Boolean function for an s-box column may be modified 
from 



f{x) = f{Xi,X2,X3, ...,X^) , 
for binary variables Xi, to 

f{P{gi {x),g 2 {x),g 3 {x), . ..,g^{x))) © h{x) , 

for some Boolean functions gi{x) and h{x) and a permutation P. The set of 
columns may then be further permuted. We will consider the above classes of 
operations in the order presented. 
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3.1 Permuting S-Box Columns 

Permuting s-box columns can be accomplished by permuting each row in the 
same way (done in one loop through the s-box). 

Various important properties are conserved under this operation. In partic- 
ular, if a column is bent, it will clearly remain so when moved; if a group of 
columns satisfies the bit independence criterion, it will still do so after being 
permuted. Finally, since s-box nonlinearity is defined to be the minimum non- 
linearity of any function in the set of all non-trivial linear combinations of the 
columns (see [4], for example), then s-box nonlinearity is also conserved through 
a column permutation. 

In carefully designed s-boxes, rearranging the columns in a key-dependent 
way does not degrade cryptographic strength. However, such an operation can 
make it significantly more difficult to align characteristics in a linear crypt- 
analysis attack and so can increase the security of the cipher by raising the 
computational complexity of mounting this attack. 



3.2 Addiug Affiue EMuctious to S-Box Columns 

In the extreme case in which the affine functions are simply all-one vectors, the 
addition can be done by XORing a constant into all rows (done in one loop 
through the s-box) . More generally, other techniques (perhaps involving storage 
of specific affine vectors) may be necessary to accomplish this addition. 

Various important properties are conserved under this operation. Because 
the nonlinearity of a Boolean function is unchanged by the addition of an affine 
function, s-box column bentness, s-box bit independence criterion, and s-box 
nonlinearity are all conserved. 

The addition of affine functions, therefore, does nothing to degrade cryp- 
tographic security in the s-boxes. However, such an operation, by modifying 
the contents of the s-boxes in a key-dependent way, can make it significantly 
more difficult to construct characteristics in a differential cryptanalysis attack 
(because it cannot be computed in advance when the XOR of two given s-box 
outputs will produce one value or another). Hence, this operation can increase 
the security of the cipher by raising the computational complexity of mounting 
this attack. 



3.3 Permutiug S-Box luputs 

Permuting the rows of an s-box seems attractive because of its potential for 
thwarting linear and differential cryptanalysis. However, it is not always possible 
to permute rows without compromising desirable s-box properties. In particular 
(e.g., for CAST-128 s-boxes), not all row permutations are permissible if column 
bentness is to be preserved. 

Biham and Biryukov [2] made only one small change to the s-box row order 
in the DES s-boxes: they used one key bit per s-box, controlling whether or not 
the first two and the last two rows should be swapped. However, an operation 
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that used more key bits and that provably could not weaken strong s-boxes may 
be preferable in some environments. 

One such operation is to use the subset of row permutations that result from 
a permutation on the s-box inputs. We will show that these do not damage the 
desirable s-box properties. 

Mister and Adams [4] introduce the notion of dynamic distance of order j 
for a function / 



and define it as 



DD,{f) 



1 

max — 
d 2 



'%m— 1 






where both d and x are binary vectors of length m, d ranges through all values 
with Hamming weight 1 < wt(c?) < j and x ranges through all possible values. 

It is shown in [4] that cryptographic properties such as Strict Avalanche 
Criterion (SAC) and Bit Independence Criterion (BIC), higher-order versions 
of these (HOSAC and HOBIC), maximum order versions of these (MOSAC 
and MOBIC), and distances from these (DSAC, DBIC, DHOSAC, DHOBIC, 
DMOSAC, and DMOBIC) can all be defined in terms of dynamic distance. 

In an s-box, all bits are equal. There is no most- or least-significant bit in 
either the input or the output. Thus, permuting the bits of x in the formula 
above does not change the value of the summation term for a given d, provided 
we apply the same permutation to d. Hence it does not change the maximum 
(the value of the dynamic distance). 

Therefore, column properties defined in terms of dynamic distance (DSAC, 
DHOSAC, and DMOSAC) remain unchanged. In particular, if the columns are 
bent functions (i.e., DMOSAC = 0) then permuting inputs preserves bentness. 

Furthermore, the s-box properties DBIC, DHOBIC, and DMOBIC also re- 
main unchanged because these are defined in terms of dynamic distance of a 
Boolean function / comprised of the XOR of a subset of s-box columns. (Note 
that permuting the inputs of each of the column functions with a fixed permu- 
tation P is identical to permuting the inputs of the combined function / using 
P.) By a similar line of reasoning, s-box nonlinearity is also unaffected by a 
permutation of its input bits. 

Permuting inputs, therefore, does nothing to degrade cryptographic security 
in the s-boxes. However, such an operation, by rearranging the order of the s-box 
rows in a key-dependent way, can make it significantly more difficult to construct 
linear or differential characteristics (because specific outputs corresponding to 
specific inputs cannot be predicted in advance). Hence, this operation can in- 
crease the security of the cipher by raising the computational complexity of 
mounting these attacks. 
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3.4 Adding Affine EMnctions to S-Box Inputs 

Adding selected affine functions to s-box inputs is another effective way of pro- 
ducing a subset of row permutations that does not reduce the cryptographic 
security of the s-box. 

In the extreme case in which the affine functions are constant values, the 
addition simply complements some of the s-box input bits. Inverting some input 
bits is equivalent to XORing a constant binary vector into the input, making 
the summation in the dynamic distance 

{f{x © c) © f{x © c © d)) 

X 

Clearly © c) goes through the same set of values that x goes through, so 
this does not change the sum and, consequently, does not change the dynamic 
distance. Therefore, column properties and s-box properties are unchanged. 

In the more general case in which the affine functions are not constant val- 
ues, the addition conditionally complements some of the s-box inputs (depending 
upon the particular values of some subset of input variables) . Consider the follow- 
ing restriction. Choose any k input variables and leave these unchanged. For the 
remaining m-k input variables, augment each with the same randomly-chosen, 
but fixed, affine function of the chosen k input variables. For example, in a 4 x n 
s-box, we may choose input variables Xi and X 2 to be unchanged and augment 0:3 
and X4 with the affine function g(xi, X2) = a;2 © 1 so that the Boolean function 
fi{xi, X2, X3, X4) defining each s-box column i becomes 



fi(Xi,X2, X3Qg(Xi, X2), X4eg(xi, X2)) = fi {Xi,X2, {X3 © X 2 © 1), {X 4 © X 2 © 1)) . 

With the operation restricted in this way it is not difficult to see that as 
the chosen k variables go through their values, at each stage the remaining m-k 
variables go through all their values (either all simultaneously complemented, 
or all simultaneously not complemented, depending upon the binary value of 
the affine function). Thus, rewriting the summation in the dynamic distance 
equation as 



X 

where xf is in accordance with the restriction as specified, we see that jf goes 
through the full set of values that x goes through, so the sum is unchanged and 
the resulting dynamic distance is unchanged. 

Adding affine functions (restricted as specified abov^ to s-box inputs, there- 
fore, does not degrade cryptographic security in the s-boxes. Like permuting 
inputs, this operation, by rearranging the order of the s-box rows in a key- 
dependent way, can make it significantly more difficult to construct linear or 

^ Note that other restrictions on the type and number of affine functions that may be 
added to preserve s-box properties may also exist. This area is for further research. 
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diflFerential characteristics. The security of the cipher may therefore be increased 
by raising the computational complexity of mounting these attacks. 



3.5 Other Possible Operations 

Other key-dependent operations that preserve s-box properties are also possible. 
For example, it is theoretically possible to construct strong s-boxes with more 
than 32 columns and select columns for actual use at set-up time, but this would 
likely be prohibitively expensive in practice since Mister and Adams [4] report 
that s-box generation time doubles for each additional column. 

Another possibility is to order the s-boxes in a key-dependent way. This is not 
particularly useful with only four s-boxes since only 4! orders are possible, adding 
less than five bits of entropy to the key space. However, with the eight s-boxes 
in CAST-128, this operation becomes somewhat more attractive. A CAST-143 
might be created in a very straightforward way: log 2 ( 8 !) = 15 bits of key puts 
the eight s-boxes into some key-dependent order (cheaply by adjusting pointers), 
and then key expansion and encryption proceeds exactly as in CAST-128 except 
with the s-boxes in the new order. The overhead (set-up time) is quite low and 
the new cipher uses 15 extra bits of unexpanded key. 



3.6 Limitations in Key-Dependent Operations 

Ciphers With XOR-Only Round EMnctions A cipher which combines s- 
box outputs with XOR, such as the CAST example in Applied Cryptography 
[2, page 334]), does not work well with some types of s-box manipulation. For 
example, permuting the order of the four round function s-boxes is of no benefit 
in such a cipher, since XOR is commutative. 

XORing different constants into the four s-boxes in such a cipher has exactly 
the same effect as XORing a single constant into each round function output, or 
into any one s-box. 

Furthermore, if the cipher’s round function combines its input and the round 
key with XOR, then XORing a constant into the output of one round is equiva- 
lent to XORing that constant into the key of the next round. If the round keys 
are already effectively random, unrelated, and unknown to the attacker (as they 
should be), then XORing them with a constant does not improve them. 

In terms of the difficulty of an attack, then, the net effect of XORing four 
constants into the s-boxes is equivalent to XORing a single constant into the 
output of the last round, for a cipher which uses XOR both to combine s-box 
outputs and to combine round input with the round key. 



Ciphers With Mixed Operations Combining S-Box Outputs A cipher 
which uses operations other than XOR to combine s-box outputs, such as Blow- 
fish or CAST-128, will give different round outputs if the order of the s-boxes is 
changed or if a constant is XORed into each row. This makes these operations 
more attractive in such ciphers. 
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Even in such ciphers, however, the precise cryptographic strength of XORing 
a constant into the rows is unclear. Certainly it is an inexpensive way to mix 
many key bits (128 if the cipher uses four m x 32 s-boxes) into the encryption, 
but it is not clear exactly how much this increases the effective key length. 

Ciphers With Mixed Operations Combining Key and Input In Blowfish 
and in some CAST ciphers, the round input is XORed with the round key at the 
start of a round, then split into four bytes which become inputs to the four s- 
boxes. XORing an 8-bit constant into each s-box input is equivalent to XORing a 
32-bit constant (the concatenation of the 8-bit constants) into each of the round 
keys. 

Suppose an attack exists that discovers the round keys when the s-boxes are 
known. Then the same attack works against the same cipher with s-boxes that 
are known but have had their rows permuted in this way. The attack discovers 
a different set of round keys equivalent to the real ones XORed with a 32-bit 
constant, but it still breaks the cipher, and with no extra work for the attacker. 

However, for ciphers that use other operations to combine the round input 
and the round key (CAST-128, for example, which uses addition and subtraction 
modulo 2^^ for input masking in some of its rounds), such an operation seems 
to add value. 

Options For both inputs and outputs, the addition of affine functions appears 
stronger than just XORing with a constant, and performing permutations ap- 
pears to be stronger again (but at much higher computational cost). In a practi- 
cal cipher, however, there appears to be no disadvantage to using XOR (for both 
input and output if mixed operations are used everywhere in the round function) 
because it is inexpensive and offers some protection against the construction of 
iterated characteristics. 

4 Practical Considerations 

4.1 stage One 

Since XORing a constant into the s-box rows is the cheapest way to bring many 
extra key bits into play; we should do that if we’re going to use this approach at 
all. The cipher’s round function should use operations other than XOR to mix 
s-box outputs so that this will be effective. 

If we are iterating through the s-box rows for that, it makes sense to permute 
the columns in the same loop. We suggest simply rotating each row under control 
of 5 bits of key. A CAST-128 implementation will have code for this, since the 
same rotation is used in the round function, and rotation is reasonably efficient. 

At this point, we have used 37 bits per s-box, 148 bits in all. In many appli- 
cations, this will be quite sufficient. 

Costs of this are minimal: 1024 XOR and rotation operations. This is much 
less than CAST-128’s round key generation overhead, let alone Blowfish’s work 
to generate s-boxes and round keys. 
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4.2 Stage Two 

To go beyond that, you can add affine functions to s-box columns or you can 
permute s-box rows in a manner equivalent to permuting the input bits. 

The choice would depend upon the relative strength of these two methods, 
along with the relation between their overheads and the resources available in a 
particular application. In our suggested implementation, using affine functions 
requires more storage while permuting the rows involves more computation. 
Neither operation looks prohibitively expensive in general, but either might be 
problematic in some environments. 

For purposes of this paper, we will treat permuting the inputs as the next 
thing to add, and then go on to look at adding affine functions. 

To permute the rows in a manner equivalent to permuting the input bits we 
add the following mechanism. We use a 256-row array, each row composed of an 
8-bit index and a 32-bit output. We can rearrange rows as follows: 

— put the (256*32)-bit s-box in the output part of the array; 

— fill the index part with the 8-bit values in order from hex 00 to FF ; 

— operate in some way on the index parts (without affecting the 32-bit s-box 
rows) so as to give each row a new index; 

— sort the 256 rows so that the index parts are again in order (00 to FF), 
moving the s-box rows along with the indexes so they are in a new order; 

— discard the index portion. 

This results in a cryptographically identical s-box with rows in the new order. 
The operations permitted in the third step for changing the 8-bit values are 
just those which are equivalent to permuting and inverting the s-box inputs. We 
can XOR a constant into all index rows or we can permute index columns. Nei- 
ther operation alters the XOR difference between index rows, so cryptographic 
properties are conserved as shown earlier. 

XORing a constant into each index row is of little benefit. This is also true of 
rotation, which uses only 3 bits per s-box (hardly enough to justify the overhead 
of sorting the s-boxes) . 

To operate usefully on the inputs, then, we should do a full permutation on 
the index columns. In code, this would need a function to permute 8-bit values 
under control of a 15-bit key. It would use 15 key bits per s-box. 

At this point, we are using 52 bits per s-box, 208 bits in all, and are permuting 
both rows and columns or both inputs and outputs. Again, this would be quite 
sufficient for many applications. 



4.3 Stage Three 

We can, however, go further by adding affine functions to the columns. 

There are exactly 512 affine Boolean functions of 8 variables. In theory, it 
would be possible to add a key-selected affine function to each s-box column, 
using 9 bits of key per column, or 1152 bits for a set of four 8 x 32 s-boxes, but 
this seems unacceptably expensive in practice. 
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Since the inverse of an afhne function is also afhne, we need only store half of 
the 512 possible functions to have them all available. Consider a (256*256)-bit 
Boolean array with afhne functions in all columns and no two columns either 
identical or inverses. From this, create four 256 x 32 arrays. This can be done 
using log2 (i2s) implementing this would also be expensive. As a 

more practical alternative, using log2 ( g®) « 13 bits of key to select eight of 
sixteen “chunks of 16 columns” from the original array may be a reasonable 
compromise. 



4.4 Putting It All Together 

Given four 256 x 32 arrays of afhne functions, we add a few operations inside 
the loop that runs through each s-box. The inner loop ends up as follows (in 
pseudo-C with “«<” for rotation and for XOR): 

unsigned *s, *a ; // pointers to s-box & affine array 

unsigned char *p; // pointer into index array 

// initialize pointers here... 

for( i = 0 ; i < 256 ; i++, s++, a++, p++ ) 

{ 

*s = (*s <<< kl) ~ (*a <<< k2) ~ k3; // 5+5+32 key bits 
*p = permutes (*p, k4) ; // 15 key bits 

} 

qsort ( ) ; 

This uses 57 key bits per s-box inside the loop. With the 13 used outside the 
loop setting up the A-boxes, and another 15 used in re-arranging the original 8 
s-boxes, we have 256 key bits in total exclusively used for s-box manipulations. 

5 Further Work 

Further work in this area can be done both on the theoretical side and on the 
practical side. For example, a formal characterization of the set of afhne func- 
tions that can be added to an s-box without reducing its cryptographic strength 
(beyond the subset specihed in this paper) would be of interest. As well, since 
factorials do not correspond to powers of 2, more precise practical spedhcations 
need to be given to convert expressions such as “log2(8!)” to bit lengths (i.e., 
it needs to be stated exactly which particular permutation corresponds to each 
value of a 15-bit key segment). 

6 Conclusions 

This paper has proposed the concept of using key-dependent s-box manipula- 
tions to strengthen specihc block ciphers against attacks which depend upon 
knowledge of the s-box contents (such as linear and differential cryptanalysis 
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and their variations) . The manipulations described include a permutation of the 
output bits, a permutation of the input bits, the addition of affine functions to 
the s-box columns, and the addition of a restricted set of affine functions to the 
s-box inputs, ft has been shown (using the concept of dynamic distance [4]) that 
such manipulations do not degrade the cryptographic properties of carefully- 
constructed s-boxes, and therefore do not degrade the cryptographic strength of 
the corresponding ciphers with respect to existing analysis. On the contrary, it 
is possible that cryptographic strength may be substantially increased by such 
manipulations because the most effective cryptanalytic attacks to date would 
appear to require a significant exhaustive search phase in addition to their cur- 
rent complexity in order to be mounted against ciphers with such “hidden” s-box 
contents. 

Some implementation considerations for this proposal were also discussed, 
and options were presented with respect to the level of complexity that might 
be employed in various environments. 
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Abstract. Twofish is a new block cipher with a 128 bit block, and a 
key length of 128, 192, or 256 bits, which has been submitted as an AES 
candidate. In this paper, we briefly review the structure of Twofish, and 
then discuss the key schedule of Twofish, and its resistance to attack. We 
close with some open questions on the security of Twohsh’s key schedule. 



1 Introduction 



NIST announced the Advanced Encryption Standard (AES) program in 1997 
NIST solicited comments from the public on the proposed standard, 
and eventually issued a call for algorithms to satisfy the standard 
The intention is for NIST to make all submissions public and eventually, through 
a process of public review and comment, choose a new encryption standard to 
replace DES. 

Twofish is our submission to the AES selection process. It meets all the 
required NIST criteria — 128-bit block; 128-, 192-, and 256-bit key; efficient on 
various platforms; etc. — and some strenuous design requirements, performance 
as well as cryptographic, of our own. 

Twofish was designed to meet NIST’s design criteria for AES 
Specifically, they are: 



— A 128-bit symmetric block cipher. 

— Key lengths of 128 bits, 192 bits, and 256 bits. 

— No weak keys. 

— Efficiency, both on the Intel Pentium Pro and other software and hardware 
platforms. 

— Flexible design: e.g., accept additional key lengths; be implementable on 
a wide variety of platforms and applications; and be suitable for a stream 
cipher, hash function, and MAC. 

— Simple design, both to facilitate ease of analysis and ease of implementation. 



S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 27- 
© Springer-Verlag Berlin Heidelberg 1999 



1999. 
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A central feature of Twofish’s security and flexibility is its key schedule. In 
this paper, we will briefly review the design of Twoflsh, and then discuss the 
security features of the key schedule. The remainder of the paper is as follows: 
First, we discuss the specific design of Twoflsh. Next, we analyze the Twoflsh 
key schedule in some detail. Finally, we point out some open questions with 
respect to the key schedule. Note that for space reasons, this paper does not 
include a complete discussion of the Twoflsh design. Instead, we refer the reader 
to http ://www. counterpane. com. 



2 Twofish 

Twoflsh uses a 16-round Feistel-like structure with additional whitening of the 
input and output. The only non-Feistel elements are the 1-bit rotates. The ro- 
tations can be moved into the F function to create a pure Feistel structure, but 
this requires an additional rotation of the words just before the output whitening 
step. 

The plaintext is split into four 32-bit words. In the input whitening step, these 
are XORed with four key words. This is followed by sixteen rounds. In each round, 
the two words on the left are used as input to the g functions. (One of them is 
rotated by 8 bits first.) The g function consists of four byte- wide key-dependent 
S-boxes, followed by a linear mixing step based on an MDS matrix. The results of 
the two g functions are combined using a Pseudo-Hadamard Transform (PHT), 
and two keywords are added. These two results are then XORed into the words 
on the right (one of which is rotated left by 1 bit first, the other is rotated right 
afterwards). The left and right halves are then swapped for the next round. After 
all the rounds, the swap of the last round is reversed, and the four words are 
XORed with four more key words to produce the ciphertext. 

More formally, the 16 bytes of plaintext po, . . - ,pi 5 are first split into 4 words 
Pq, . . . , P3 of 32 bits each using the little-endian convention. 

3 

Pi = ^p(4i+j) •2®-’ z = 0, . . . , 3 
3=0 

In the input whitening step, these words are XORed with 4 words of the expanded 
key. 

Ro,i = Pi®Ki i = 0, . . . , 3 

In each of the 16 rounds, the first two words are used as input to the function F, 
which also takes the round number as input. The third word is XORed with the 
first output of F and then rotated right by one bit. The fourth word is rotated 
left by one bit and then XORed with the second output word of F. Finally, the 
two halves are exchanged. Thus, 

(Fr, 0 :Fr^l) = F {Rrfi , Rrp , r) 

Rr+lfl = ROR(i?r,2 © Frfi, 1) 
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^r+1,1 = ROL(i?r,3) 1) ® Fr^l 
Rr+1,2 — Rr,0 
Rr+1,3 = Rr.l 

for r = 0, . . . , 15 and where ROR and ROL are functions that rotate their first 
argument (a 32-bit word) left or right by the number of bits indicated by their 
second argument. 

The output whitening step undoes the ‘swap’ of the last round, and XORs 
the data words with 4 words of the expanded key. 



Ci — Rl6,(z-t-2) mod 4 ® R-i+4 f — 0, . . . , 3 



The four words of ciphertext are then written as 16 bytes Cq, . . . , C15 using the 
same little-endian conversion used for the plaintext. 



Ci = 



28 (z mod 4) 



mod 2® 



z = 0,...,15 



2.1 The Function F 

The function F is a key-dependent permutation on 64-bit values. It takes three 
arguments, two input words Rq and Ri, and the round number r used to select 
the appropriate subkeys. Rq is passed through the g function, which yields Tq. 
R\ is rotated left by 8 bits and then passed through the g function to yield 
Ti. The results Tq and Ti are then combined in a PHT and two words of the 
expanded key are added. 



To — g{Ro) 

Ti =5 (ROL(Ri,8)) 

Fq = {Tq -|- Ti 4- K2r+s) mod 2 ®^ 

Fi = (Tq -|- 2 Ti K2r+o) mod 2 ®^ 

where (Tq, Ti) is the result of F. We also define the function F' for use in our 
analysis. F' is identical to the F function, except that it does not add any key 
blocks to the output. (The PHT is still performed.) 



2.2 The Fhnction g 

The function g forms the heart of Twofish. The input word X is split into four 
bytes. Each byte is run through its own key-dependent S-box. Each S-box is 
bijective, takes 8 bits of input, and produces 8 bits of output. The four results 
are interpreted as a vector of length 4 over GF(2®), and multiplied by the 4x4 
MDS matrix (using the field GF(2®) for the computations). The resulting vector 
is interpreted as a 32-bit word which is the result of g. 



Xi = [X/2®* mod 2® z = 0, . . . , 3 
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where Si are the key-dependent S-boxes and Z is the result of g. For this to be 
well-defined, we need to specify the correspondence between byte values and the 
field elements of GF(2®). We represent GF(2®) as GF(2)[x]/u(a;) where v{x) = 
X® -I- X® -I- X® -I- -I- 1 is a primitive polynomial of degree 8 over GF(2). The 

field element a = at G GF(2) is identified with the byte value 

This is in some sense the “natural” mapping; addition in GF(2®) 
corresponds to a XOR of the bytes. 

2.3 The Key Schedule 

The key schedule has to provide 40 words of expanded key Kq, ■ ■ ■, K39, and the 
4 key-dependent S-boxes used in the g function. Twofish is defined for keys of 
length N = 128, N = 192, and N — 256. Keys of any length shorter than 256 
bits can be used by padding them with zeroes until the next larger defined key 
length. 

We define k = A^/64. The key M consists of 8k bytes mo, . . .,msk-i- The 
bytes are first converted into 2k words of 32 bits each 

3 

Mj = ■ 2^-^ i = 0 ,...,2fc-l 

i=o 

and then into two word vectors of length k. 

Me = (Mo, M 2 , . . . , M2k-2) 

Mo = {Ml, M3, . . . , M2k-i) 

A third word vector of length k is also derived from the key. This is done by 
taking the key bytes in groups of 8, interpreting them as a vector over GF(2®), 
and multiplying them by a 4 x 8 matrix derived from a Reed-Solomon code. Each 
result of 4 bytes is then interpreted as a 32-bit word. These words make up the 
third vector. 
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3 

Si = Yl ■ 2 ®' 

J=0 

for i = 0, . . . , fc — 1, and 

5 = (5fc_i,5fc_2,...,5o) 

Note that S lists the words in “reverse” order. For the RS matrix multiply, 
GF(2®) is represented by GF(2)[x]/te(x), where w{x) = x® + x® + + 1 

is another primitive polynomial of degree 8 over GF(2). The mapping between 
byte values and elements of GF(2®) uses the same definition as used for the MDS 
matrix multiply. 

Additional Key Lengths Twofish can accept keys of any byte length up to 
256 bits. For key sizes that are not defined above, the key is padded at the end 
with zero bytes to the next larger length that is defined. For example, an 80-bit 
key mo, . . . , mg would be extended by setting rrii = 0 for i = 10, . . . , 15 and 
treating it as a 128-bit key. 

The Fhnction h The function h takes two inputs — a 32-bit word X and a 
list L = {Lo, ■ ■ ■ , Lk-i) of 32-bit words of length k — and produces one word 
of output. This function works in k stages. In each stage, the four bytes are 
each passed through a fixed S-box, and XORed with a byte derived from the list. 
Finally, the bytes are once again passed through a fixed S-box, and the four 
bytes are multiplied by the MDS matrix just as in g. More formally: we split the 
words into bytes. 

= [Li/2^3\^ jnod 2® 

Xj = [A/2®-’J mod 2® 



for i = 0, . . . , fc — 1 and j = 0, . . . , 3. Then the sequence of substitutions and 
xoRs is applied. 

Vk^j — 3Cj J = 0, . . . , 3 

If fc = 4 we have 



2/3,0 = 9i[l/4,o] ® ^3,0 
2/3,1 = 9o[2/4,i] ® ^3,1 
2/3,2 = 9o[2/4,2] ® h,2 
2/3,3 = 9l [2/4,3] ® ^3,3 

If fc > 3 we have 

2/2,0 = 9i [2/3,0] ® h,o 
2/2,1 = 9i[2/3,i] ® h,i 
2/2,2 = 9o[2/3,2] ® h,2 
2/2,3 = 9o[2/3,3] ® ^2,3 
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In all cases we have 



yo = 9i[9o[9o[j/2,o] ® ^i,o] ® ^o,o] 

2/1 = 9o[9o[9i [2/2,1] ® ^1,1] ® ^0,1] 

2/2 = 9i [91 [90 [2/2,2] ® ^1,2] ® ^0,2] 

2/3 = 9o [91 [91 [2/2,3] ® ^1,3] ® ^0,3] 

Here, 90 and q\ are fixed permutations on 8-bit values that we will discuss shortly. 
The resulting vector of j/i’s is multiplied by the MDS matrix, just as in the g 
function. 
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Z = Y,z,-2^^ 

i=0 



where Z is the result of h. 



The Key-dependent S-boxes We can now define the S-boxes in the function 
g by 

g{X) = h{X, S) 

That is, for i = 0, . . . , 3, the key-dependent S-box Si is formed by the mapping 
from Xi to Ui in the h function, where the list L is equal to the vector S derived 
from the key. 



The Expanded Key Words Kj The words of the expanded key are defined 
using the h function. 



p = 2^^ -h 2^® -h 2® + 2® 

Ai = h(2zp, Me) 

= ROL(h((2i+l)p,Mo),8) 

K- 2 i = {Ai Bi) mod 2®^ 

K 2 *+i = ROL((T, + 2Bi) mod 2®^ 9) 

The constant p is used here to duplicate bytes; it has the property that for 
i = 0, . . . , 255, the word ip consists of four equal bytes, each with the value i. 
The function h is applied to words of this type. For At the byte values are 2i, 
and the second argument of h is Me- Bi is computed similarly using 2z-|- 1 as the 
byte value and Mo as the second argument, with an extra rotate over 8 bits. The 
values Ai and Bi are combined in a PHT. One of the results is further rotated 
by 9 bits. The two results form two words of the expanded key. 
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The Permutations qo and qi The permutations qo and qi are fixed permuta- 
tions on 8-bit values. They are constructed from four different d-bit permutations 
each. We have investigated the resulting 8-bit permutations, qo and qi, exten- 
sively, and believe them to be at least no weaker than randomly selected 8-bit 
permutations. 



3 Analysis of The Key Schedule 

The key schedule has been designed to provide resistance to attack, while also 
providing a great deal of flexibility in implementation. For example, after S has 
been computed from Me, Mg, all remaining key scheduling can be done “on the 
fly” during encryption. This allows for very low-memory implementations, and 
for implementations with excellent key agility. In implementations with more 
memory, all the subkeys can be precomputed for improved performance. In im- 
plementations with still more memory, such as on modern high-end processors 
with a reasonable RAM cache size, the effects of the key-dependent S-boxes 
and the MDS matrix multiply can be precomputed, reducing the work per g 
computation to four table lookups and three XORs. 

Note that S is only half the size of the key. This was done so that precom- 
putation of the S-boxes and MDS matrix multiply would be sufficiently fast, 
and so that low-memory implementations would not have to take too large a 
performance hit. This means that the g function is slightly different for longer 
keys than for shorter keys. 



3.1 Byte Sequences 

The subkeys in Twofish are generated by using the h function, which can be 
seen as four key-dependent S-boxes followed by an MDS matrix. The input to 
the S-boxes is basically a counter. In this section we analyze the sequences of 
outputs that this construction can generate. 

All key material is used to define key-dependent S-boxes in h, which are then 
used to derive subkeys. Each S-box gets a sequence of inputs, (0, 2, 4, . . . , 38) 
or (1, 3, 5, . . ., 39). The S-box generates a corresponding sequence of outputs. 
The corresponding outputs from the four S-boxes are combined using the MDS 
matrix multiply to produce the sequence of Ai and Bi words, and those words 
are processed with the PHT (with a couple of rotations thrown in) to produce 
a pair of subkey words. Analyzing these byte sequences thus gives us important 
insights about the whole key schedule. 

We can model each byte sequence generated by a key-dependent S-box as 
a randomly selected non-repeating byte sequence of length 20. This allows us 
to make many useful predictions about the likelihood of finding keys or pairs 
of keys with various interesting properties. Because we will be analyzing the 
key schedule using this assumption in the remainder of this section, we should 
discuss how reasonable it is to treat this byte sequence as randomly generated. 
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We have not found any statistical deviations between our key-dependent S-boxes 
and the random model in any of our extensive statistical tests. 

We are looking at sequences of 20 bytes that are all distinct. There are 
2561/236! of those sequences, which is close to 2^®®. 



3.2 Equivalent S-box Keys 

We have verified that there are no equivalent S-box keys that generate the same 
sequence of 20 bytes. In the random model, the chance of this happening for the 
N = 256 case is about 2®^ • 2“^®® = 2“®®. This is the chance of such equiva- 
lent S-boxes existing at all. In fact, we recently completed an exhaustive search 
demonstrating that no pair of key inputs to an S-box produces identical S-box 
entries. 



3.3 Byte Difference Sequences 

Let us consider the more general problem of how to get a given 20-byte difference 
sequence between a pair of S-boxes. Suppose we have two S-boxes, each defined 
using 32 bits of key material, which are not equal, but which must be chosen to 
give us a given difference sequence in the XOR of their byte sequences. We can 
estimate the probability of a pair of 4-byte inputs existing with the desired XOR 
difference sequence as 2®® • 2“^®® = 2“®®. Note that this is the probability that 
such a pair of inputs exists, not the probability that a random pair of keys will 
have this property. 



3.4 The A and B Sequences 

From the properties of the byte sequences, we can discuss the properties of the 
A and B sequences generated by each key M. 

Ai = MDS(so(i, M), si{i, M), S 2 {i, M), ss(i, M)) 

Since the MDS matrix multiply is invertible, and since i is different for each 
round’s subkey words generated, we can see that no A or i? value can repeat 
itself. 

Similarly, we can see from the construction of h that each key byte affects 
exactly one S-box used to generate A or B. Changing a single key byte always 
alters every one of the 20 bytes of output from that S-box, and so always alters 
every word in the 20-word A or B sequence to which it contributes. 

Consider a single byte of output from one of the S-boxes. If we cycle any one 
of the key bytes that contributes to that S-box through all 256 possible values, 
the output of the S-box will also cycle through all 256 possible values. If we take 
four key bytes that contribute to four different S-boxes, and we cycle those four 
bytes through all possible values, then the result of h will also cycle through all 
possible values. This proves that A and B are uniformly distributed for all key 
lengths, assuming the key M is uniformly distributed. 
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3.5 Difference Sequences in A and B 

Let us also consider difference sequences. If we have a specific difference sequence 
we want to see in A, we are faced with an interesting problem: since the MDS 
matrix multiply is XOR-Iinear, each desired output XOR from the matrix multiply 
allows only one possible input XOR. This means that: 

1. A zero output XOR difference in A can occur only with a zero output XOR 
difference in all four of the byte sequences used to build A. 

2. Only 1020 possible output differences (out of the 2^^) in Ai can occur with 
a single “active” (altered) S-box. Most differences require all four S-boxes 
used to form Ai to be active. 

3. Each desired output XOR in A requires a specific output XOR in each of 
the four byte sequences used to form A. This means that getting any de- 
sired difference sequence into all 20 Ai values requires getting a desired XOR 
sequence into all four 20-byte sequences. (Note that if the desired output 
XOR in Ai is an appropriate value, up to three of the four byte sequences 
can be identical without much trouble, simply by leaving their key material 
unchanged.) As mentioned above, this is very unlikely to be possible for a 
randomly chosen difference pattern in the A sequence. (There are of course 
difference sequences of A^’s that can occur.) 

The above analysis is of course also valid for the B sequence. 



3.6 The Sequence {K 2 i, K 2 i+i) 

As Ai and Bi are uniformly distributed (over all keys), so are all the Ki. As 
all pairs {Ai, Bi) are distinct, all the pairs {K 2 i, K 2 i+i) are distinct, although it 
might happen that Ki = Kj for any pair of i and j. 

3.7 Difference Sequences in the Subkeys 

Each difference sequence in A and B translate into a difference sequences in 
{K 2 i,K 2 i+i). However, while it is natural to consider A and B difference se- 
quences in terms of XOR differences, subkeys can reasonably be considered either 
as XOR differences or as differences modulo 2^^. Thus, we may discuss difference 
sequences: 



D[i,M,M*] = K,,m-K,,m^ 

X[t, M, M*] = K,,m © K,,m- 

where the difference is computed between the key value M and M* . 



3.8 XOR Differences in the Subkeys 

Each round, the subkeys are added to the results of the PHT of two g functions, 
and the results of those additions are XORed into half of the cipher block. An 




36 



John Kelsey et al. 



XOR difference in the subkeys has a fairly high probability of passing through 
the addition operation and ending up in the cipher block. (The probability of 
this is determined by the Hamming weight of the XOR difference, not counting 
the highest-order bit.) However, to get into the subkeys, a XOR difference must 
first pass through the first addition. 

Consider 



x + y = z 

{x ® 6o) + y = z ® 5i 

Let k be the number of bits set in So, not counting the highest-order bit. Then, 
the highest probability value for is Sq, and the probability that this will hold 
is 2“^. This is true because addition and XOR are very closely related operations. 
The only difference between the two is the carry between bit positions. If hipping 
a given bit changes the carry into the next bit position, this alters the output 
XOR difference. This happens with probability 1/2 per bit. The situation is more 
complex for multiple adjacent bits, but the general rule still holds: for every bit 
in the XOR difference not in the high-order bit position, the probability that the 
difference will pass through correctly is cut in half. 

For the subkey generation, consider an XOR difference, Sq, in A. This affects 
two subkey words: 

K2i = Ai ® Bi 

K2i+i = ROL(Ti -|- 2Bi, 9) 

where the additions are modulo 2^^. If we assume these XOR differences propagate 
independently in the two subkeys (which appears to be the case), we see that this 
leads to an XOR difference of Jo in the even subkey word with probability 2“^, 
and the XOR difference ROL{Sq, 9) in the odd subkey with the same probability. 
The most probable XOR difference in the round’s subkey block thus occurs with 
probabiity 2“^^. A desired XOR difference sequence for all 20 pairs of subkey 
words is thus quite difficult to get to work when fc > 3, assuming the desired 
XOR difference sequence can be created in the A sequence at all. 

When the XOR difference is in B, the result is slightly more complicated; the 
most probable XOR difference in a round’s pair of subkey words may be either 
2 -( 2 fc-i) Qj. 2“^^, depending on whether or not the XOR difference in B covers 
the next-to-highest-order bit. 

An XOR difference in A or i? is easy to analyze in terms of additive differences 
modulo 2^^: an XOR difference with k active bits has 2^ equally likely additive 
differences. Note that if we have a additive difference in A, we get it in both 
subkey words, just rotated left nine bits in the odd subkey word. Thus, fc-bit 
XOR differences lead to a given additive difference in a pair of subkey words with 
probability 2“^. (The rotation does not really complicate things much for the 
attacker, who knows where the changed bits are.) 

Note that when additive subkey differences modulo 2^^ are used in an attack, 
they survive badly through the XOR with the plaintext block. We estimate that 
XOR differences are much more likely to be directly useful in mounting an attack. 
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3.9 Key-dependent Characteristics and Weak Keys 



Th e conce pt of a key-dependent characteristic seems to have bee n introduc ed 
in in their cgptanalysis of Lucifer, and also appears in in 

an analysis of IDEA|The idea is that certain iterative properties of the block 
cipher useful to an attacker become more effective against the cipher for a specific 
subset of keys. 

A differential attack on Twofish may consider XOR-based differences, addi- 
tive differences, or both. If an attacker sends XOR differences through the PHT 
and subkey addition steps, his differential characteristic probabilities will be de- 
pendent on the subkey values involved. In general, low- weight subkeys will give 
an attacker some advantage, but this advantage is relatively small. (Zero bits in 
the subkeys improve the probabilities of cleanly getting XOR-based differential 
characteristics through the subkey addition.) Since there appears to be no spe- 
cial way to choose the key to make the subkey sequence especially low weight, 
we do not believe this kind of key-dependent differential characteristic will have 
any relevance in attacking Twofish. 

A much more interesting issue in terms of key-dependent characteristics is 
whether the key-dependent S-boxes are ever generated with especially high prob- 
ability differential or high bias linear characteristics. The statistical analysis pre- 
sented earlier shows that the best linear and differential characteristics over all 
possible keys are still quite unlikely. 

Note that the structure of both differential and linear attacks in Twofish is 
such that such attacks appear to generally require good characteristics through 
at least three of the four key-dependent S-boxes (if not all four), so a single 
high-probability differential or linear characteristic for one S-box will not create 
a weakness in the cipher as a whole. Our statistical testing has allowed us to 
estimate that few or no keys result in a single S-box with a differential charac- 
teristic of probability higher than 24/256 for any length key, and with a linear 
characteristic with bias higher than 108/256. These probabilities do not allow for 
practical differential or linear attacks. Further, for an attacker to mount a dif- 
ferential or linear attack, it appears to be necessary to get very high-probability 
differential or linear characteristics in all four S-boxes at once. 



3.10 Related-key Cryptanalysis 



Related-key cryptanalysis uses a cipher’s key schedule 

to break plaintexts encrypted with related keys. In its most advanced form, dif- 
ferential related-key cryptanalysis, both plaintexts and keys with chosen differ- 
entials are used to recover the keys. This type of analysis has had considerable 
success ag ainst ciphers with simplistic key schedules — e.g., GOST and 3- Way 
— and is a realistic attack in some circumstances. A conventional at- 
tack is usually judged in terms of the number of plaintexts or ciphertexts needed 
for the attack, and the level of access to the cipher needed to get those texts (e.g.. 



1 See for further cryptanalysis of IDEA weak keys. 
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known plaintext, chosen plaintext, adaptive chosen plaintext); in a related- key 
attack, we must add the requirement for encryptions to occur under two differ- 
ent, but related, keys. 



3.11 Resistance to Related-key Slide Attacks 

A “slide” attack occurs in an iterated cipher when the encryption of one block 
for rounds 1 through n is the same as the encryption of another block for rounds 
s-h 1 to s-hn. An attacker can look at two encryptions, and can slide the rounds 
forward in one of them relative to another. S-1 can be broken with a 

slide attack Travois has identical round functions, and can 

also be broken with a slide attack. Conventional slide attacks allow one to break 
the cipher with only known- or chosen-plaintext queries; however, as we shall 
see next, there is a generalization to related-key attacks as well. 

Related- key slide attacks were first discovered by Biham in his attack on a 
DES variant To mount a related-key slide attack on Twofish, an attacker 

must find a pair of keys M, M* such that the key-dependent S-boxes in g are 
unchanged, but the subkey sequences slide down one round. This amounts to 
finding, for each of the eight byte-permutations used for subkey generation, a 
change in the keys such that: 



= Si{j + 2s,M*) 



for n values of j. In total, this requires 8n of these relations to hold. 

Let us look in more detail for a fixed key M. Let m € {5, . . . , 8} be the number 
of S-boxes used to compute the round keys that are affected by the difference 
between M and M* . Observe that m > 5 due to the restriction that S cannot 
change and the properties of the RS matrix that at least 5 inputs must change 
to keep the output constant. There are at most possible choices of 

M*. We have a total of nm 8-bit relations that need to be satisfied. The expected 
number of M* that satisfy these relations is thus (^) . n > 4 

this is dominated by the case m = 5; we will ignore the other cases for now. So 
for each M we can expect about 2®®“^°" keys M* that support a slide attack for 
n > 4. This means that any specific key is unlikely to support a slide attack with 
n > 4. Over all possible key pairs, we expect 2^®®“^°" pairs M, M* for which a 
slide of n > 4 occurs. Thus, it is unlikely that a pair exists at all with n > 8. 



Resistance to Related-key Differential Attacks A related-key differential 
attack seeks to mount a differential attack on a block cipher through the key, as 
well as or instead of through the plaintext/ciphertext port. Against Twofish, such 
an attack must control the subkey difference sequence for at least the rounds in 
the middle. For the sake of simplifying discussions of the attack, let us consider 
an attacker who wants to put a chosen subkey difference into the middle twelve 
rounds’ subkeys. That is, he wants to change M to M*, and control D[i, M, M*] 
for i = 12.. 35. At the same time, he needs to keep the g function, and thus the 
key S, from changing. All else being equal, the longer the key, the more freedom 
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an attacker has to mount a related-key differential attack. We thus will assume 
the use of 256 bit keys for the remainder of this section. Note that a successful 
related key attack on 128 or 192 bit keys that gets only zero subkey differences 
in the rounds whose subkey differences it must control translates directly to an 
equivalent related key attack on 256 bit keys. 

Consider the position of the attacker if he attempts a related-key differential 
attack with different S keys. This must result in different g outputs for all inputs, 
since we know that there are no pairs of S values that lead to identical S-boxes. 
Assuming the pair of S values does not lead to linearly-related S-boxes, it will 
not be possible to compensate for this change in S with changes in the subkeys 
in single rounds. The added difficulty is approximately that of adding 24 active 
S-boxes to the existing related-key attack. For this reason, we believe that any 
useful related-key attack will require a pair of keys that keeps S unchanged. 



The Zero Difference Case The simplest related-key attack to analyze is the 
one that keeps both S and also the middle twelve rounds’ subkeys unchanged. It 
thus seeks to generate identical A and B sequences for twelve rounds, and thus 
to keep the individual byte sequences used to derive A and B identical. 

The RS code used to derive S from M strictly limits the ways an attacker 
can change M without altering S. The attacker must try to keep the number 
of active subkey generating S-boxes as low as possible, since each active S-box 
is another constraint on his attack. The attacker can keep the number of active 
S-boxes down to five without altering S, and so this is what he should do. With 
only the key bytes affecting these five subkey generation S-boxes active, he can 
alter between one and four bytes in all five S-boxes; the nature of the RS matrix 
is that if he needs to alter four bytes in any one of these S-boxes, he must alter 
bytes in all five. In practice, in order to maximize his control over the byte 
sequences generated by these S-boxes, he must alter four bytes in all five active 
S-boxes. 

To get zero subkey differences, the attacker must get zero differences in the 
byte sequences generated by all five active S-boxes. Consider a single such byte 
sequence: The attacker tries to find a pair of four-byte key inputs such that they 
lead to identical byte sequences in the middle twelve rounds, which means the 
middle twelve bytes. There are 2®^ pairs of key inputs from which to choose, and 
about 2®^ possible byte sequences available. If the byte sequences behave more- 
or-less like random functions of the key inputs, this implies that it is extremely 
unlikely that an attacker can find a pair of key inputs that will get identical byte 
sequences in these middle twelve rounds. We discuss this kind of analysis of byte 
sequences in section ^3 From this analysis, we would not expect to see a pair 
of keys for even one S-box with more than eight successive bytes unchanged, 
and we would expect even eight successive bytes of unchanged byte sequence to 
require control of all four key bytes into the S-box. We would expect a specific 
pair of key bytes to be required to generate these similar byte sequences. 
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To extend this to five active S-boxes, we expect there to be, at best, a single 
pair of values for the twenty active key bytes that leave the middle eight subkeys 
unchanged. 



Other Difference Sequences An attacker who has control of the XOR differ- 
ence sequences in Ai , Bi does not necessarily have great control over the XOR or 
modulo 2^^ difference sequence that appears in the subkeys. 

First, we must consider the context of a related-key differential attack. The 
attacker does not generally know all of the key bytes generating either Ai or Bi. 
Instead, he knows the XOR difference sequence in Ai and Bi. 

Consider an Ai value with an XOR difference of 5. If the Hamming weight 
of 5 is k, not including the high-order bit, then the best estimate for the XOR 
difference that ends up in the two subkey words for a given round generally has 
probability about 2“^^. (Control of the Ai, Bi XOR difference sequence does not 
make controlling the subkey XOR differences substantially easier.) 

Consider an Ai value with an XOR difference oi 6. If the Hamming weight of 
S is k, then the best estimate for the modulo 2^^ difference of the two subkey 
words for a given round has probability about 2“^. 

This points out one of the difficulties in mounting any kind of successful 
related-key attack with nonzero Ai, Bi difference sequences. If an attacker can 
find a difference sequence for Ai, Bi that keeps k — 3, and needs to control the 
subkey differences for twelve rounds, he has a probability of about 2“^^ of getting 
the most likely XOR subkey difference sequence, and about 2“^® of getting the 
most likely modulo 2®^ difference sequence. 



Probability of a Successful Attack With One Related-Key Query We 

consider the use of the RS matrix in deriving S from M to be a powerful defense 
against related-key differential attacks, because it forces an attacker to keep at 
least five key generation S-boxes active. Our analysis suggests that any useful 
control of the subkey difference sequence requires that each active S-box in the 
attack have all four key bytes changed. 

Further, our analysis suggests that, for nearly any useful difference sequence, 
each active S-box in the attack has a specific pair of defining key bytes it needs 
to work. At attacker specifying his key relation in terms of bytewise XOR has 
five pairs of sequences of four key bytes each, which he wants to get. This leaves 
him with a probability of a pair of keys with his desired relation actually leading 
to the desired attack of about 2“^^®, which moves the attack totally outside the 
realm of practical attacks. 

So long as an attacker is unable to improve this, either by finding a way to get 
useful difference sequences into the subkeys without having so many active key 
bytes, or by finding a way to mount related-key attacks with different S values 
for the different keys, we do not believe that any kind of related key differential 
attack is feasible. 

Note the implication of this: Clever ways to control a couple extra rounds’ 
subkey differences are not going to make the attacks feasible, unless they also 
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allow the attacker to use far fewer active key bytes. For reference, note that with 
one altered key byte per active subkey generation S-box, the attacker ends up 
with a 2“^® probability that a pair of related keys will yield an attack; with 
two key bytes per active S-box, this increases to 2“^®; with three key bytes per 
active S-box, it increases to 2“^^^. In practice, this means that any key relation 
requiring more than one byte of key changed per active S-box appears to be 
impractical. 

3.12 Conclusions 

Our analysis suggests that related-key attacks against the full Twofish are not 
workable. Note, however, that we have spent less time working on resistance to 
chosen key attacks, such as will be available to an attacker if Twofish is used in 
the straightforward way to define a hash function. For this reason, we recommend 
that more analysis be done before Twofish is used in the straightforward way 
as a hash function, and we note that it appears to be much more secure to use 
Twofish in this way with 128-bit keys than with 256-bit keys, despite the fact 
that this also slows the speed of a hash function down by a factor of two. 

4 Open Questions 

Several questions remain open regarding the strength of the Twofish key sched- 
ule. These include for following: 

1 . We have discussed differential related key attacks within a certain set of as- 
sumptions, including the assumption that the subkey generation mechanism 
has certain more-or-less random properties. We do not have a stronger ar- 
gument than our intuition and statistical tests that this is the case. A proof 
or stronger argument in either direction would be of great interest. 

2. We have done some analysis (not reflected here for space reasons) on partial 
chosen key attacks on Twofish. Still remaining are issues raised by the desire 
to use Twofish in some Davies-Meyer hashing mode. What kind of collision 
resistance might we expect in this case. 

3. We have assumed that the derivation of go and q\ introduces no weaknesses. 
Further analysis of this construction, as well as our larger S-box construction 
methods, would be of interest. 

4. We have discussed related-key slide attacks. There are many other ways to 
reorder the round subkeys. Do any of these ways lead to attacks on the 
cipher? 
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Abstract. This paper investigates some security properties of basic 
substitution-permutation encryption networks (SPNs) by studying the 
nonlinearity distribution and the XOR table distribution. Based on the 
idea that mixing small weak transformations results in a large strong ci- 
pher, we provide some evidence which shows that a basic SPN converges 
to a randomly generated s-box with the same dimensions as the SPN 
after enough number of rounds. We also present a new differential-like 
attack which is easy to implement and outperforms the classical differen- 
tial cryptanalysis on the basic SPN structure. In particular, it is shown 
that 64-bit SPNs with 8x8 s-boxes are resistant to our attack after 12 
rounds. All of above effort may be regarded as the first step towards 
provable security for SPN cryptosystems. 

Keywords: block cipher, nonlinearity, XOR table, differential attack, 
provable security. 



1 Introduction 

Substitution-permutation encryption networks (SPNs) were first suggested by 
Feistel as a simple and effective implementation of private-key block ciphers 
(symmetric ciphers) based on the concept of “confusion” and “diffusion” intro- 
duced by Shannon An SPN is constructed by a number of rounds of nonlin- 
ear substitutions (s-boxes) followed by bit permutations. Keying the network can 
be accomplished by XORing the key bits with the data bits before each round of 
substitutions and after the last round. The key bits associated with each round 
are derived from the master key according to the key scheduling algorithm. An 
example of a small SPN with N=16, n=4 and R=3 is illustrated in Fig.^where 
N represents the block size of the SPN consisting of R rounds of n x n s-boxes. 

There are two powerful classes of cryptanalytic attacks that can be mounted 
against block ciphers such as SPNs. Differential cryptanalysis Q exploits a highly 
probable differential characteristic derived from the XOR table of the s-boxes 
ly. Linear cryptanalysis B depends on the best linear approximation which 
is directly related to nonlinearity, an important cryptographic property. More 
details of these attacks on SPNs can be found in Q. On the other hand, it 
has been proved that completeness or nondegeneracy ^ can be achieved in the 
design of SPN cryptosystems. Avalanche characteristics are well studied in Q 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 43-^^1999. 
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Fig. 1. SPN with N=16, n=4 and R=3 



and ^3' of fho above suggests that the basic SPN has many desirable and 
predictable cryptographic properties useful for the design of cryptosystems. 

In this paper, an estimate and upper bound on the nonlinearity distribution 
of bijective (invertible) s-boxes is presented. Based on the experimental results 
on nonlinearity and XOR table distribution, we show that the basic SPN re- 
sembles random bijective s-boxes of the same size with an increasing number of 
rounds, i.e., it converges to the ideal cipher. In addition, we present a practical 
differential-like attack on basic SPNs which exploits the Markov chain model 
based on the number of active s-boxes ’ ^ . Our attack is effective regardless of 



the key-scheduling algorithm and more efficient than classical differential crypt- 
analysis. From the attack, we are able to find some hints on proving the security 
of SPN cryptosystems. 



2 Background 

2.1 Nonlinearity 

A Boolean function f{X) is an affine function if /(A) = A ■ X (B b where X, A 
€ {0, 1}”, b € {0, 1}; “ • ” is the dot product and “©” is the XOR operation. 
AfRne functions with b = 0 are called linear functions. The set of all n-bit affine 
functions is denoted by An ■ The set of all n-bit Boolean functions is denoted by 
T 

ri’ 

The Hamming weight of a function / G Tn is the number of ones in its 
truth table, denoted by wt{f). The Hamming distance between two functions 
f,g£j-n is defined as wt{f © g). A function f G J-n with wt{f) = 2”“^ is said 
to be balanced. 

We define the nonlinearity of a Boolean function / G Tn as the minimum 
Hamming distance to all affine functions, denoted by 

NL{f) = min wt{f © g) . 



( 1 ) 
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Let On denote the set of all non-zero linear combinations of output functions 
of an n X n s-box, i.e., On = {/ | / = oi • /i © • • • 0 a„ • /„} where ai G {0, 1}, all 
Ci yf 0 and fi is the i-th output function of the s-box. Then the nonlinearity of 
the s-box is defined as the minimum nonlinearity of all functions in the set On- 

NL{S) = min NL(f) . (2) 

J EC/n 



2.2 XOR Table 

A dynamic property of an s-box is the XOR difference table. For a given input 
difference, it provides possible input values to the s-box which generate the 
corresponding output difference. We define the entry of the XOR difference table 
for an n X n s-box with the input difference and the output difference AA, AY G 
{0, 1}”, AA, AT 0 as follows: 

XOR{AX, AY) = {A I A G {0, 1}", 5(A) © S{X © AX) = AY} . (3) 

If each entry of the XOR distribution table is replaced by the number of elements 
in that entry, we consider the number as the XOR value and the new table as 
the XOR table. The largest value in the XOR table is called the XOR value of 
the s-box, denoted by AOi?*. 



3 Nonlinearity Distribution 

An output ciphertext bit of an SPN can be described as a nonlinear function of 
the input plaintext and the key bits. The nonlinearity of this function depends 
on the nonlinearity of the s-boxes, the only nonlinear components in the SPN 
cipher. If the nonlinearities of the s-boxes are very small, the cipher would be 
subject to a linear attack which makes use of a linear approximation to com- 
pute the key bits. Even if the nonlinearities of the s-boxes are made very high, 
it is still unknown whether the final SPN is highly nonlinear. However, Ritter 
^3 uses experiments to show that the mixing constructions (permutations in 
the basic SPN structure) produce nonlinearity levels and distributions similar 
to those of an ideal cipher, i.e., a keyed look-up table of sufficient size. So what 
is the nonlinearity distribution? In this section we investigate the nonlinearity 
distribution of balanced Boolean functions and bijective s-boxes. (We also ob- 
tained similar results for random Boolean functions and s-boxes. We leave them 
out since they are not related to this work.) 

Lemma 1. For a bijective s-box, all non-zero linear combinations of the output 
functions are balanced. 



Lemma 2. The probability that the nonlinearity of a randomly selected n-bit 
balanced Boolean function f is equal to 2i is upper bounded by 



Pr{NL{f) = 2i) < 



(2"+i - 2) 




( 4 ) 



where “=” holds if i < 2^ 
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Proof. The total number of balanced affine functions is 2"+^ — 2. For each of 

them, there are ( ^ j balanced functions at Hamming distance 2i. So the total 
number of balanced functions with nonlinearity 2i is upper bounded by their 
product. For i < 2”“^, the bound is tight because these balanced functions are 
distinct. The result follows by noting that the total number of balanced functions 



is 




□ 



Theorem 3. For an n x n bijective s-box, the probability that its nonlinearity 
is less than or equal to 2i is upper bounded by 



Pr{NL{S)<2i)< 

b"- ij j=o 




Proof. Note that 



Pr{NL{S) < 2i) < {2'^ - 1) Pr{NL{f) < 2i) . 



( 5 ) 

( 6 ) 

□ 



The upper bound in Lemma 2 can be used as an approximation to the non- 
linearity distribution of a balanced function provided the nonlinearity is not 
very high. So if we assume that all non-zero linear combinations of all s-box 
output functions are independent in terms of nonlinearity, then we can get the 
approximation since 

Pr{NL{S) < 2i) w 1 - (1 - Pr{NL{f) < 2i)f^~^ . (7) 

It can be seen that the approximation and upper bound are close in Tabled 
We tested 10^ random bijective s-boxes of size 8x8. Only probabilities for 
nonlinearity from 80 to 98 are shown. 

A more complete nonlinearity distribution is plotted in Fig.Haccording to the 
approximation expression ^ . It indicates that s-boxes with nonlinearity greater 
than 98 are extremely rare. At the low end, the probability also decreases dra- 
matically. So most s-boxes have nonlinearities between 80 and 98. For example, 
Pr{NL{S) < 80) = 3.63 x lO"® and Pr{NL{S) > 98) = 1.01 x 10"®. Linear 
s-boxes (whose nonlinearities are 0) are very unlikely to occur since the proba- 
bility is about 2.26 x 10“^^ which agrees with the previous result in Q. (After 
we allow for the fact that ^ uses an old definition of nonlinearity.) 



4 XOR Distribution Table 

The success of differential cryptanalysis relies on the existence of a highly prob- 
able differential which is equivalent to the existence of a large value in the XOR 
table of the SPN cipher, where we view the entire cipher as a big s-box. Unfor- 
tunately, it is impractical to examine the properties of the XOR tables of large 



Probability 
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Table 1. Experimental result versus theoretical approximation in Q and upper 
bound in Q for the nonlinearity of random bijective 8x8 s-boxes 



NL< 


Experiment 


Theory 


Bound 


80 


1.83 


X 


10” 


T 


1.83 


X 


10“ 




1.83 


X 


10“ 




82 


8.70 


X 


10“ 


4 


8.57 


X 


10“ 


-4 


8.57 


X 


10“ 


-4 


84 


3.76 


X 


10“ 


3 


3.74 


X 


10“ 




3.75 


X 


10“ 




86 


1.52 


X 


10“ 


'J 


1.52 


X 


10“ 


- J 


1.53 


X 


10“ 


-J 


88 


5.68 


X 


10“ 


‘J 


5.68 


X 


10“ 


-'J 


5.85 


X 


10“ 


- z. 


90 


1.88 


X 


10“ 


1 


1.89 


X 


10“ 


-1 


2.09 


X 


10“ 


-1 


92 


5.03 


X 


10“ 


1 


5.03 


X 


10“ 


-1 


6.99 


X 


10“ 


-1 


94 


8.89 


X 


10“ 


1 


8.89 


X 


10“ 


-1 


- 


96 


9.98 


X 


10“ 


1 


9.99 


X 


10“ 




- 


98 


1.00 


X 


10“ 


U 


1.00 


X 


10“ 


-U 


- 




Fig. 2. Nonlinearity distribution for 8x8 s-boxes based on the approximation 
expression in B 
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SPNs (e.g., 64-bit SPNs). One advantage of the basic SPN is its simple “scal- 
able” structure which make it possible to study a smaller version like the 16-bit 
SPN. Then we may intuitively extrapolate the results to 64-bit SPNs since they 
are constructed in a similar way. Thus, in this section we consider the 16-bit 
SPNs (see Fig.^ and the corresponding 16 x 16 bijective s-boxes. We use the 
chi-square test to show that the XOR table of the basic SPN resembles a large 
random XOR table as the number of rounds increases. This statistical method 
is also used as a cryptanalytic attack in Q and F~l - 



4.1 Chi-square Test 



The first observation is that there are many large entries in the first row of the 
XOR table of an SPN with a small number of rounds. Based on one experiment, 
we plot the frequencies of the entries in the first row for SPNs with 4, 5, 6, and 
7 rounds, respectively, in Fig.H 

Although the plot is only one sample and cannot stand for the general case, 
the distinct differences of the entry distribution show that it is sensitive to the 
number of rounds. Furthermore, the entry distribution tends to stabilize with 
increasing number of rounds. It is natural to compare the XOR distribution 
table of an SPN to that of a random s-box (ideal cipher). We then employ the 
chi-square test to provide a quantitative measure of the difference. 

Chi-square test is a standard test for the comparison of two distributions for 
binned data The chi-square statistic is defined by 



X 



2 



E 



{R^ - 
Ri + Si 



(8) 



where both Ri and Si are experimental data. Any term with Ri = Si = Q is 
omitted from the sum. In general, a large value of indicates a large difference 
between the two distributions. The above method is described in Q. 



4.2 Comparison between SPNs and Random S-boxes 

Now we use the chi-square test B to examine the entry distribution in the 
XOR table of SPNs and random s-boxes. As an example, we compare 5-round 
and 8-round 16-bit SPNs to a random 16 x 16 s-box with respect to the first 
row of the XOR tables. We use the random s-box as the reference. The result 
is shown in Table Q where the frequencies of the corresponding XOR values 
and the chi-square values are shown. The threshold for 6 degrees of freedom 
and 1% significance level is 16.81. Therefore the 5-round SPN is rejected and 
the 8-round SPN is accepted, which means that an 8-round SPN behaves like a 
random cipher while a 5-round SPN does not in this particular test. 
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4 rounds 5 rounds 




6 rounds 7 rounds 





XOR table entry 



XOR table entry 



Fig. 3. The entry distribution in the first row of the XOR table for a “sample” 
16 X 16 SPN 



Although small chi-square values do not always mean that the distribution is 
ideal (i.e., looks random), consistent large chi-square values indicate a significant 
difference. Thus for a complete comparison, it is straightforward to compare ev- 
ery row and consider the average of chi-square values. However, our experimental 
results show that the chi-square values are sensitive to the row index, i.e., the 
input difference AX. It is found that for those input differences which influence 
a large number of s-boxes in the first round of the SPN, the corresponding row 
of the XOR table is “closer” to that of the random cipher than those input 
differences which influence a small number of s-boxes in the first round of the 
SPN. For example, Table^shows the chi-square values for the input difference 
of 0001, 0011, 0111, and 1111 (in hexadecimal format) respectively. 
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Table 2. Chi-square test for SPNs and random s-boxes of size 16 x 16 (the 
threshold is 16.81) 



XOR= 


random s-box 


5-round SPN 


8-round SPN 


0 


39721 


40053 


39779 


2 


19917 


19521 


19819 


4 


4972 


4908 


5002 


6 


809 


861 


814 


8 


107 


155 


108 


10 


8 


19 


13 


>12 


2 


19 


1 




- 


34.43 


1.92 



Table 3. Chi-square test for different rows of XOR table 



Rounds 


AX 




0001 


AX 




0011 


AX 




0111 


AX 




1111 


3 


5.53 


X 




1.71 


X 


"[(?“ 


3.40 


X 


“uF” 


3.33 


X 


10^ 


4 


9.32 


X 


■rF 


1.31 


X 


10^ 


7.39 


X 


"uF” 


7.23 


X 


"rF“ 


5 


4.93 


X 


10^ 


8.81 


X 


T(P“ 


4.06 


X 


"uF” 


4.43 


X 


"rF“ 


6 


6.46 


X 


"uF 


4.20 


X 


"uF 


6.25 


X 


■rF” 


4.03 


X 


■rF“ 


7 


1.55 


X 


T(P“ 


6.07 


X 


"uF 


2.81 


X 


■rF“ 


1.27 


X 


■rF“ 


8 


4.90 


X 


T(P“ 


2.16 


X 


T(P“ 


2.14 


X 


"rF“ 


7.79 


X 


"rF“ 



Therefore, we should concentrate on the average over those rows where the 
input difference influences only one s-box in the first round. There are in total 
(f) 15 = 60 such rows. Figurejillustrates the chi-square test results based on one 
experiment. The maximum and minimum chi-square values are also presented. 
The average chi-square value can be regarded as a measure of the distance to the 
ideal cipher. It can be seen that on average the distribution of the XOR table 
resembles that of the ideal cipher with increasing number of rounds. And for the 
16-bit SPN, we need at least 7 rounds to make the distribution “random”. In 
fact, we usually need more rounds due to the effect of fluctuations. For 7 or more 
rounds, the average chi-square value is very close to the result of comparing two 
random s-boxes. 



5 Differential-like Attack 

Differential cryptanalysis of SPNs is based on the best characteristic instead of 
the best differential Heys and Tavares Q derived upper bounds on the most 
likely differential characteristic as a function of the maximum XOR value and 
the number of active s-boxes (i.e., the s-boxes whose inputs are changed in the 
process of encrypting two plaintexts). 

In this section, we present a new differential-like attack on basic SPNs. By 
modeling the number of active s-boxes in the network using Markov chains 
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Fig. 4. Chi-square test for SPNs and random s-boxes with respect to the 60 rows 
of XOR table (with one active s-box in the first round) 



we may predict the number of active s-boxes in the second round provided that 
we make one s-box in the first round (the target s-box) active and know the 
number of active s-boxes in the last round. This enables us to determine the 
subkeys of the first round and the subsequent rounds can be attacked similarly. 



5.1 Principle of the Attack 



Consider an r-round SPN with representing the number of active s-boxes in 
round i (1 < i < r), the probability of k active s-boxes in round r given one 
active s-box in the first round is denoted by Pr{rir = k\ni = 1). Actually, it is 
a transition probability of r — 1 rounds. Now the selection matrix is defined by 



S = 



(r 

^jk 



where 



(r) _ Pr (ur = k-,U 2 = j\ni = 1) 



Sjfc - 



Pr {rir = k\ni = 1) 



(9) 



(r) 

In other words, is the probability of having j active s-boxes in the second 
round given that there is one active s-box in the first round and k active s- 
boxes in the last round. All of the above probabilities can be calculated from the 
transition matrix of Markov chains ^3. 

Now from the matrix S we may predict the number of active s-boxes in the 
second round by selecting those greater than 50% . If we know how many 
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s-boxes are active in the second round, then we know the output changes of 
the target s-box. Since the exact inputs to the target s-box are known, we can 
increment the counters of possible subkeys according to the XOR table of the 
target s-box. After we examine a number of chosen plaintext pairs, the correct 
subkey will be counted more often than all the others. The same method is used 
to derive all subkeys in the first round. If the first round is broken, then we can 
break the subsequent rounds in the same manner. 

We find that it is highly probable that only one s-box in the second round 
is active if a small number of s-boxes are active in the last round. This also 
conforms with our intuition. So only the first row of the matrix S is important. 
If we define the selection set of the r-round SPN &&% = {k\s^^l >0.5}, then the 
algorithm for attacking the target s-box in the first round of the r-round SPN 
is: 

1. Encrypt a pair of random plaintexts such that only the target s-box is active. 
If the number of active s-boxes in the last round is not in the set 7}, then 
go to 1. 

2. Increment the counters of possible subkeys according to those XOR table 
entries which make only one s-box in the second round active. If there is no 
such subkey with counter greater than all the others by a threshold value 
(e.g., 2), then go to 1. 

3. Stop. The subkey of the target s-box is found. 

The number of chosen plaintext pairs required to determine the subkeys in the 
first round may be approximated by Np = c/ Pd, where c is a constant which 
may be approximated by 6m, and m is the number of s-boxes in one round 
(similar to the results in Q), and 



The threshold value corresponds to the confidence level of success. The higher 
the value, the more confidence we have that the subkey is correct and the more 
chosen plaintext pairs we need. The selection set may become empty for SPNs 
with increasing number of rounds, which suggests that they are immune to our 
attack. 

5.2 Experimental Results 

Our analytical results for the 64-bit SPN with randomly selected 8x8 s-boxes 
are shown in Tabled Here we define the complexity as the number of chosen 
plaintext pairs required to determine the first round subkeys according to the 
calculation of Np and a choice of c = 50. Our attack is effective for up to 11-round 
SPNs. Moreover, if we guess the 8-bit subkey associated with the target s-box 
in the first round, we may attack one more round with complexity increased 
2® times. But this is not practical in that the required plaintext pairs for 12 




( 10 ) 



i&Tr 
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Table 4. Differential-like cryptanalysis of a 64-bit SPN with 8x8 s-boxes 



Rounds 


Complexity {Np) 


2 


2"" 


3 




4 




5 




6 


2^s 


7 


2^c 


8 




9 


2^13 


10 


2^7 


11 


2^315 


12 


2'33 



rounds is approximately equal to the total number of plaintext pairs available 
for a 64-bit SPN. 

A simulation program was run to attack a 64-bit SPN composed of 8 x 8 s- 
boxes with m^imum XOR table entry XOR* = 4. The experimental results for 
up to 8 round^are plotted in Fig.^which also shows the theoretical complexity 
of our new attack and classical differential cryptanalysis. Note that the example 
seems unfavorable to differential cryptanalysis since we use a set of s-boxes with 
small XOR* which are generated from ^ and Q. The expected XOR* value 
of a randomly selected 8x8 bijective s-box is upper bounded by 16 But 
in practice, the value is about 12 and the highly probable characteristic can 
not always make use of all of these large values. So our attack outperforms the 
classical differential attack in a practical sense. 



5.3 Comments on the Attack 

There is an improvement in the implementation of our attack. By carefully choos- 
ing such plaintext pairs that make one s-box in the second round active more 
likely (this can be achieved by inspecting the XOR table of the target s-box), we 
may enhance the attack by a factor of two. However, the gain is not significant 
for attacking a large number of rounds. 

In fact, our attack exploits the slow avalanche effect of basic SPNs. So the use 
of s-boxes with a high diffusion order Q could minimize the impact of the attack. 
In addition, by replacing the permutation with the linear transformation | or 
multipermutations we could also thwart the attack effectively. However, this 
introduces a delay which is significant for software implementation. 

Both our attack and the classical differential attack are chosen plaintext 
attacks. The fundamental difference between our attack and the classical attack 
is that there is a filtering process in our attack. 



^ The experiment for 8 rounds takes about 20 days on a Sun-ULTRA 1 machine. 
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Fig. 5. Comparison of classical differential cryptanalysis and our new attack on 
a 64-bit SPN with 8x8 s-boxes 



Our attack first selects the ciphertexts, then checks those selected, while the 
classical attack checks every ciphertext trying to derive the key. Hence, our attack 
can easily take advantage of distributed computing (e.g., over the Internet). 
Furthermore, it is easy to implement our attack since only minimal preliminary 
analysis is needed. 

In order to get secure SPNs, we need to make the SPNs behave like random 
big s-boxes. It is necessary to make the one-bit propagation probability less than 
the random probability so that the SPNs is not distinguishable from the random 
s-boxes. This results in an estimate of the minimum number of rounds required, 
denoted by r, as follows: 




where n is the size of s-boxes. After simplifying the above expression, we can get 

r > n-klog 2 (n) . (12) 

Then it can be seen that when n = 4 (i.e., for 16-bit SPNs), we need at least 
7 rounds, and when n = 8 (i.e., for 64-bit SPNs), we need at least 12 rounds. 
These agree with our previous results. 
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6 Conclusion 

We have presented an upper bound on the nonlinearity distribution of randomly 
selected bijective s-boxes which shows that low nonlinearities are very unlikely 
for large s-boxes. Note that we have only considered bijective s-boxes throughout 
this work. An SPN may be regarded as a set of large s-boxes indexed by the keys. 

Based on the experimental results on XOR table distributions and supported 
by the results on nonlinearity we have shown that the basic SPN converges 
to the ideal cipher with an increasing number of rounds. In addition, we have 
presented a practical differential-like attack on basic SPNs. From the attack, it 
can be seen that the number of active s-boxes is very important. For a secure 
SPN, it is necessary to make the number of active s-boxes in the last round 
independent of the number of active s-boxes in previous rounds. This may be 
equivalent to Pr {AY\AX) = Pr{AY) (where AX and AY are plaintext and 
ciphertext differences respectively) which implies that the ciphertext is a random 
permutation of plaintext. Based on Markov chains it is found that the 
number of active s-boxes in the last rounds tends to be independent for basic 
SPNs with an increasing number of rounds. These experiments and analytical 
estimates may be regarded as some evidence towards provable security for SPN 
cryptosystems. 
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Abstract. Maurer’s universal test is a very common randomness test, 
capable of detecting a wide gamut of statistical defects. The algorithm 
is simple (a few Java code lines), flexible (a variety of parameter combi- 
nations can be chosen by the tester) and fast. Although the test is based 
on sound probabilistic grounds, one of its crucial parts uses the heuristic 
approximation: 

c(L, K)^Q.7-^ + (1.6 + 

In this work we compute the precise value of c(L, K) and show that the 
inaccuracy due to the heuristic estimate can make the test 2.67 times 
more permissive than what is theoretically admitted. Moreover, we es- 
tablish a new asymptotic relation between the test parameter and the 
source’s entropy. 



1 Introduction 

In statistics, randomness refers to these situations where care is taken to see that 
each individual has the same chance of being included in the sample group. In 
practice, random sampling is not easy : being after a random sample of people, 
it’s not good enough to stand on a street corner and select every fifth person 
who passes as this would exclude habitual motorists from the sample; call on 50 
homes in different areas, and you may end up with only housewives’ opinions, 
their husbands being at work; pin a set of names from a telephone directory, and 
you exclude in limine those who do not have a telephone. 

Whilst the use of random samples proves helpful in literally thousands of 
fields, non-random sampling is fatally disastrous in cryptography. Assessing the 
randomness of noisy sources is therefore crucial and a variety of tests for doing 
so exists. Interestingly, most if not all such tests are designed around a common 
skeleton, called the monkey paradigm. Informally, the idea consists in measuring 
the expectation at which a monkey playing with a typewriter would create a 
meaningful text. Although one can easily conclude that a complex text (e.g. the 
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lACR’s bylaws) has a negligible monkey probability, a simple word such as cat 
is expected to appear more frequently (each = 17, 576 keystrokes) and could be 
used as a basic (yet very insufficient) randomness test. 

However, analyzing textual features is much more efficient than pattern- 
scanning where inter-pattern information is wasted without being re-cycled for 
deriving additional monkeyness evidence. 

Usually, parameters such as the average inter-symbol distance or the length 
of sequences containing the complete alphabet are measured in a sample and a 
parameter is calculated from the difference between the measure and its corre- 
sponding expectation when a monkey, theorized as a binary symmetric source 
(BSS), is given control over the keyboard. A BSS is a random source which 
outputs statistically independent and symmetrically distributed binary random 
variables. Based on the expected distribution of the BSS’ parameter, the test 
succeeds or fails. 

We refer the reader to [2,4] for a systematic treatment of randomness tests 
and focus the following sections on a particular test, suggested by Maurer in [5]. 



2 Maurer’s universal test 



Maurer’s universal test [5] takes as input three integers {L,Q,K} and a {Q + 
K) X L = fV-bit sample = [si, . . . , sn] generated by the tested source. 

Let B denote the set {0,1}. Denoting by 6„(s^) = [sL(n-i)-i-i) • • • > SLn] the 
n-th L-bit block of , the test function IR is defined by : 

, Q+K 

fTu{s^) = Jl log2A„(s^) (1) 

n—Q+1 



where, 



An{s^) 



n if Vz < ^ bn{s^) 

min{z : z > l,6„(s^) = bn-i{s^)} otherwise. 



To tune the test’s rejection rate, one must first know the distribution of /tc {R^), 
where denotes a sequence of N bits emitted by a BSS. A sample would 
then be rejected if the number of standard deviations separating its /tu from 
E[fTu{R^)\ exceeds a reasonable constant^. 

For statistically independent random variables the variance of a sum is the 
sum of variances but the A„-terms in (1) are heavily inter-dependent; conse- 
quently, [5] introduces a corrective factor c{L, K) by which the standard devi- 
ation of Jtu is reduced compared to what it would have been if the A„-terms 
were independent : 

Varlfe = c(L. Kf x MR" )] 



^ the precise value of E[fT^{R^)] is computed in [5] and recalled in section 3.3. 
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A heuristic estimate of c{L, K) is given for practical purposes in [5] : 

c(A, K) ^ c'{L, K) = 0.7-^+ (^1.6 + 

In the next section we compute the precise value of c{L,K), under the admis- 
sible assumption that Q ^ oo (in practice, Q should be larger than 10 x 2^); 
this enables a much better tuning of the test’s rejection rate (according to [5] 
the precise computation of c(A, K) should have required a considerable if not 
prohibitive computing effort). 



3 An accurate expression of c{L, K) 

3.1 Preliminary computations 

For any set of random variables, we have : 

n n 

Var[^A,] =^Var[A,] + 2 ^ Cov[A„ A,] (3) 

i—1 i—1 

where Cov[Ai, Xj] is the covariance of Xi and Xj : 

Cov[Ai, A 2 ] = E[XiX2] - E[Xi] X E[X2] (4) 

Throughout this paper the notation ai = log 2 Ai will be extensively used and, 
unless specified otherwise. At will stand for Ai{R^). 

Formulae (1), (2) and (3) yield : 

2 

c(£,Ay = i+ y; Cov|„e+.,„e+,] 

l<i<3<K 

Assuming that Q 00 (in practice, Q > IQ x 2^), the covariance of ai and aj 
is only a function oi k = j — i and by the change of variables k = j — i we get : 

2 ^ k 

c{L, Kf = 1+ J X (1 - ^) X Cov[a„, a„+fc] (5) 



whereas (4) yields : 

Cov[a„, a„+fc] = log 2 zlog 2 jPr[A„+fc = j, A„ = z] - E[anf (6) 

i.i>i 

Considering a source emitting the random variables = Ui, U 2 , ■ ■ ■, Un, and 
letting bn = 6„([/^), we get : 

PY[An{U^) = i]= ^ • • • > ^ 
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and, when the 5„(i7^)-blocks are statistically independent and uniformly dis- 
tributed, 



Fr[Ar,{U^) = i]= X (1 - = b]y~' 



For a BSS we thus have : 



Pr[A„ = i] = 2-^{l - 2-^y-^ for z > 1 

3.2 Expression of Pr[A„_|_fc = j, An — *] 

Deriving the BSS’ Pr[A„+fc = j,An = i] for a fixed i > 1 and variable j > 1 
is somewhat more technical and requires the separate analysis of five distinct 
cases : 

• Disjoint blocks 1 < j < fc - 1 




Fig. 1. DISJOINT SEQUENCES. 



When 1 < j < fc — 1, the events {An+k = j) and (A„ = i) are indepen- 
dent, as there is no overlap between [bn+k-j ■ ■ - bn+k] and [bn-i ■ ■ - bn] (figure 1); 
consequently. 



Pr[A„+fc = j, An = i]= Pr[A„+fc = j] x Pr[A„ = i] 
Pr[An+k = j, An = i] = 2-2^(l - 2-^y+^-^ 



• Adjacent blocks j — fc 




Fig. 2. ADJACENT SEQUENCES. 
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Letting b = bn+k = bn = bn-i and letting £j=k[b] be the event (figure 2) 



£j=k[b] — 

— b^ 

bn+k—1 ^5 ■ ■ ■ ; bn-\-l 

bn — b^ 

bn—1 ^5 • • • 5 bn-i-\-l 7^ b^ 
bn—i — 



PA£j=k[b]] = 

Pr[6„+fc = 6]x 

Pl’[^n+fc— 1 7^ ^5 ■ ■ ■ 7 bn-{-l 7^ ^ 

Pr[6„ = b]x 

Pl’[^n— 1 7^ ^5 ■ • • 5 bn-i-\-l 7^ ^ 

Pr[6„_i = b] 



we get, 

Pr[£:,=fc[6]] = Pr[6„ = bf x Pr[6„ + 6]'=+*-" = 2-3^(l - 



Pr[A„+fc = k,An = i]^ ^ '^A^j=k[b]] 
Pr[An+k = k,An = i]= 2-"^(l - 2-^)*+'=-2 



• Intersecting blocks fc+l<i<fc+z-l 




n-i n+k-j ^ 

Fig. 3. INTERSECTING SEQUENCES. 



For k-\-l < j < the sequence [bn-\-k-j ■ ■ ■ ^n+fc] intersects [6n-z • • - bn] 

as illustrated in figure 3. Letting b — bn-\-k — bn-\-k-j and b' — bn — bn-ij we get 
the following configuration, denoted Sk-\-i<j<k-\-i-i[b,b'] : 

^k-\-l<j<k-\-i—l[b-, b ] = ^bn-\-k — b^ 

bn-\-k—l 7^ ^5 • • ■ 5 ^n+1 7^ b^ 

bn = 

1 ^ {^5 ^ ■ ■ ■ 7 ^n+fc— j+1 ^ {^5 ^ } 5 

^n+fc— j — ^5 

^n+fc— j — 1 7^ ^ 5 • ■ ■ 7 z+1 7^ b , 

6n-z = 6'} 



Pr[A„+fc = j, = i] = ^ Pr[£ifc+i<j<fc+i-i[^, U] 

b.b’eBi' 

b^b' 



whereby : 
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for Pr[6„ = b] = Pr[6„ = b'] — 2 ^ 
Pr[6„ ^ 5] = 1 - 2-^ 

Pr[6„^ {6,5'}] = 1-2x2-^ 



and finally : 

Pr[A„+fc = j, A^ = i] = 2-2^(1 - 2-^)*+'=-2 (^1 - 

• The forbidden case j = k + i 




Fig. 4. THE FORBIDDEN CASE. 



If An = i, An+k can not be equal to fc + i, as shown in figure 4. 

Pr[A„+fc = k + i, An = i] = 0 

• Inclusive blocks j > k + i + 1 




Fig. 5. INCLUSIVE SEQUENCES. 



For j > k + i + 1, the sequence [bn-i . . .6„] is included in [bn+k-j ■ ■ ■6n+fcj. 
As depicted in figure 5, the blocks of [6„+i . . .6„+fc-i] differ from 5, those of 
[bn-i+i . . . bn-i] differ from both 6 and 6' and those of [6„+fc-j+i . . . bn-i-i] differ 
from 5. Letting Sj>k+i+i[b,b'] be the event : 
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— \^n+k — ^5 

^ n + k —1 7^ ^5 ■ ■ ■ ; ^n+1 7^ ^5 
bn = b', 

, . . . , bn-i-\-l ^{b,b'}, 

bn—i — b , 

bn—i —1 ^5 • • • 5 bn-\-k—j-\-l 7 ^ ^5 
^n+fc— j — 



Pr[A„+fc = j, An = i]= ^ Pr[£:j>fc+i+i[6, 6']] 

b.b’eBi' 

b^b' 



we obtain : 

Pr[A„+fc = j, An = z] = 2-271 - 2 - 7^-2 (^1 - 



3.3 Expression of c{L, K) 

Let us now define the function : 



,i-l 



h{z, fc) = (1 — z) ^^log 2 (z + k)z^ 

i=l 

For a fixed z, the sequence |h( 2 , fc)| has the inductive property : 

h{z, k) = {I — z) log 2 (fc + 1) + 2 X h{z, fc + 1) (7) 

1 



Let 



u= 1-2 



-L 



and u = 1 — 



2^ - 1 

The expected value E[fT^{R^)] of the test parameter fT^(R^) for a BSS is 
given by : 

00 

E[fTu{R^)] = E[an] = ^log 2 i X Pr[A„ = i] = h{u,0) 

and the variance of is : 

Var[a„] = E[(a„)2] - (E[a„])2 

00 

= 2--^ - h{u, 0)2 

i=l 

From equation ( 6 ) and the expressions of Pr[A„+fc = j, A„ = z], one can derive 
the following expression : 

Cov[a„, Qn+k] = u’^( h{u, 0) (h{v, k) - h{u, k)) 



+2 ^^log2Z 






2=1 



Ujk-\- i) — h{v, k i — 1 
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and, using equation (5), finally obtain : 






q{L,l)-q{L,K) 

K 



where : 



Z=1 Z=1 



F{1, L, K) = u'^ (h{v, l + K-1)- h{u, I + K)^ [h{v, 0) - v^h{v, o) 
+u X h{u, 0) {h{u, l + K-l)- h{v, 1 + K-1)'J 

G{1, L, K) = u(^h{v, l + K-1)- h{u, I + K)'J 

^{1 + K) {h{v, 0) — v^h{v, 1)) — 2”^ zlog 2 

+u(; + it' - l) h{u, 0) (/z(u, l + K-1)- h{v, l + K-1)'^ 



3.4 Computing c{L, K) in practice 

The functions h{u, k), h{v, k), p{L, K) and q{L, K) are all power series in u or 
V and converge rapidly (t = 33 x 2^ terms are experimentally sufficient). 

To speed things further, 



fc)| 



and 



Kk<2t 



fc)} 



Kfc<2i 



could be tabulated to compute c{L, K) in 0{2^). 

For K > t, we get with an excellent approximation : 



c{L,K)^ 



d{L) + 



e{L) X 2^ 
K 



( 8 ) 



where 



d{L) = 1-2 



P{L,1) 

Var[a„] 



and 



e{L) = 



q{L,l) 

Var[a„] 



X 2“-^+^ 



In most cases approximation (8) is sufficient, as [5] recommends to choose 
K > 1000 X 2^ > 33 X 2^. 

Although rather complicated to prove (ten pages omitted for lack of space), 
it is interesting to note that asymptotically : 



lim {E[fTAR^)]-L) = C = 

L — »^oo 




e"« log2 ^ = -0.8327462 
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lim Var[a„] = = 3.4237147 

L^oo 6 In 2 

lim d(L) = 1 - 4- 0.3920729 

L^oo TT^ 

lim e{L) = 4-(41n2 - 1) = 0.3592016 

L^oo 7T^ 

The distribution of /tu can be approximated by the normal distribution 
of mean E[fTu{R^)] and standard deviation : 

a = c{L,K)^Ya.v[ar,]/K (9) 

E[fTu{R^)\, Var[a„], d{L) and e{L) are listed in table 1 for 3 < T < 16 and 
L ^ 00. 



L 


ElhuiR")] 


Var[a„] 


d(L) 


e{L) 


3 


2.4016068 


1.9013347 


0.2732725 


0.4890883 


4 


3.3112247 


2.3577369 


0.3045101 


0.4435381 


5 


4.2534266 


2.7045528 


0.3296587 


0.4137196 


6 


5.2177052 


2.9540324 


0.3489769 


0.3941338 


7 


6.1962507 


3.1253919 


0.3631815 


0.3813210 


8 


7.1836656 


3.2386622 


0.3732189 


0.3730195 


9 


8.1764248 


3.3112009 


0.3800637 


0.3677118 


10 


9.1723243 


3.3564569 


0.3845867 


0.3643695 


11 


10.1700323 


3.3840870 


0.3874942 


0.3622979 


12 


11.1687649 


3.4006541 


0.3893189 


0.3610336 


13 


12.1680703 


3.4104380 


0.3904405 


0.3602731 


14 


13.1676926 


3.4161418 


0.3911178 


0.3598216 


15 


14.1674884 


3.4194304 


0.3915202 


0.3595571 


16 


15.1673788 


3.4213083 


0.3917561 


0.3594040 


00 


L - 0.8327462 


3.4237147 


0.3920729 


0.3592016 



Table 1. E[fT^{R^)], Var[a„], d{L) and e{L) for 3 < L < 16 and L —> oo 



4 How accurate is Maurer’s test ? 

Let c'(L, K) be Maurer’s approximation for c{L, K), and let a' be the standard 
deviation calculated under this approximation. 

c'{L,K) = 0.7-^+ (^1.6+^^ K-i (10) 



a' = c'(L,iL)y/Var[a„]/iL 
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Letting y' be the approximated number of standard deviations away from the 
mean allowed for fTu(s^)j ^ device is rejected if and only if /tc/(s^) < ti or 
/tc(s^) > t 2 , where t\ and are defined by : 

h = E[fTu{R^)] - y'cf' and t 2 = E[fT^{R^)] + y'a' 

y' is chosen such that N{—y') = p' where p' is the approximated rejection 
rate. N{x) is the integral of the normal density function [3] defined as : 

M{x) = —^= [ e~^ 

W2^7-oo 

The actual number of allowed standard deviations is consequently given by y = 
y' ct'/ct, yielding a rejection rate of p = 2J\f{—y) — 2J\f{—y' o' j a). 

The worst and average rationes p' j p are listed in table 2 for 3 < T < 16 and 
1000 y.2^ < K < 4000 x 2^ and p' = 0.001 (f.e. y' = 3.30), as suggested in [5]. 
Figures show that the inaccuracy due to (10) can make the test 2.67 times more 
permissive than what is theoretically admitted. 

The correct thresholds ti and t 2 can now be precisely computed using for- 
mulae (8), (9) and : 

h = E[fTu{R^)] - y<^ and t 2 = E[fT^{R^)] + yo 
where y is chosen such that A/’(— y) = p/2 and p is the rejection rate. 



L 


lim c'(L,K) 

/C— »oo 


lim c(L,K) 

/C— »oo 


worst p' /p 


average p' /p 


3 


0.4333333 


0.5227547 


0.1541921 


0.1547350 


4 


0.5000000 


0.5518244 


0.3462276 


0.3464583 


5 


0.5400000 


0.5741591 


0.5058411 


0.5097624 


6 


0.5666667 


0.5907426 


0.6245271 


0.6394724 


7 


0.5857143 


0.6026454 


0.7215661 


0.7565605 


8 


0.6000000 


0.6109165 


0.8118111 


0.8775954 


9 


0.6111111 


0.6164930 


1.0607613 


1.0117992 


10 


0.6200000 


0.6201505 


1.2317137 


1.1634270 


11 


0.6272727 


0.6224903 


1.4245388 


1.3337681 


12 


0.6333333 


0.6239543 


1.6386583 


1.5223726 


13 


0.6384615 


0.6248524 


1.8723810 


1.7278139 


14 


0.6428571 


0.6253941 


2.1234364 


1.9481901 


15 


0.6466667 


0.6257157 


2.3893840 


2.1814850 


16 


0.6500000 


0.6259042 


2.6678142 


2.4257316 



Table 2. A comparison of Maurer’s {c' , p'} and the actual {c, p} values. 
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5 The entropy conjecture 



Maurer’s test parameter is closely related to the source’s per-bit entropy, which 
measures the effective key-size of a cryptosystem keyed by the source’s output. 
[5] gives the following result, which applies to every binary ergodic stationary 
source S with finite memory : 



lim 
L — »^oo 



ElfTum)] 

L 



Hs 



( 11 ) 



where Hs is the source’s per-bit entropy. Moreover, [5] conjectures that (11) can 
be further refined as : 



lim 

L — »^oo 



EihuiU^)] - LHs 






e"«log2^ = -0.8327462 



In this section we show that the conjecture is false and that the correct asymp- 
totic relation between E[fTu{Us)] ^’^‘1 source’s entropy is : 



lim 

L — »^oo L 



E[fTAU^)]-y F, 



= C 



where Fi is the entropy of the z-th order approximation of the source, and : 



lim Fl = Hs 

L — »^oo 



5.1 Statistical model for a random source 

Consider a source S emitting a sequence Ui,U 2 ,U^, . . . of binary random vari- 
ables. S' is a Unite memory source if there exists a positive integer M such that 
the conditional probability distribution of 17„, given Ui, . . Un-i, only depends 
on the last M emitted bits : 

Pu„\Ui...U^-i{Un\ui . ..Un-l) = (7„_m . ■ (Un I ■ ■ -Un-l) 

for n > M and for every binary sequence [ui, . . . , u„] G {0, 1}". The smallest M 
is called the memory of the source. The probability distribution of is thus 
determined by the source’s state Sn = • • •, Un-i] at step n. The source 

is stationary if it satisfies : 



Pu^\eAuW) = Pui\eAuW) 

for all n > M, for u G {0, 1} and a G {0, 1}^. The state-sequence of a stationary 
source with memory M forms a finite Markov chain : the source can be in a finite 
number (actually 2^) of states <7i, 0 < i < 2^ — 1, and there is a set of transition 
probabilities Pr[(Tj|(jJ, expressing the odds that if the system is in state at it 
will next go to state aj. For a general treatment of Markov chains, the reader is 
referred to [1]. In the case of a source with memory M, each of the 2^ states has 
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at most two successor states with non-zero probability, depending on whether a 
zero or a one is emitted. The transition probabilities are thus determined by the 
set of conditional probabilities Pi = Pr[l|<Tj, 0 < i < 2^ — 1 of emitting a one 
from each state <7^. The entropy of state ai is then Hi = H{pi), where H is the 
binary entropy function : 

H{x) = -xlog 2 a; - (1 - x) log 2 (l - x) 

For the class of ergodic Markov processes the probabilities Pj{N) of being in 
state (Tj after N emitted bits, approach (as N ^ oo) an equilibrium Pj which 
must satisfy the system of 2^ linear equations : 




\ Pj = Pi for 0 < j < 2*^ - 2 

k i=0 

The source’s entropy is then the average of the entropies Hi (of states ai) 
weighted by the state-probabilities Pi : 

Hs = Y,P^P^ (12) 

i 

5.2 Asymptotic relation between E[fTu{U^)] and Hs 

The mean of fruits) l^r S is given by : 

ElfrAU^)] = = z]log2i (13) 

i>l 

with 

PrlMUs) = i]= ^"-1 ^ ■ • • > ^ b, b^-i = b] (14) 

Following [6] (theorem 3), the sequences of length L can be looked upon as 
independent for a sufficiently large L : 

Pr[A4U^) =i]=Y. Pr[^]'(l - 

and 

E[fTAUs)] = E P#]"Elog2Ul-P#])*-' 

b&Bi- i>l 

Re-using the function v{r) defined in [5], 

OO 

v(r) = r ^^(1 — r)®”^ log 2 i 

i=l 



( 15 ) 




An Accurate Evaluation of Maurer’s Universal Test 



69 



we have 

ElfTuiU^)] = E Pr[^]^PrH) 

wherefrom one can show that, 

poo 

lim [ri(r) + log 2 r]= e~^ log 2 ^ = C = —0.8327462 

’'^0 Jo 

which yields : 



lim 

L — »^oo 



E[fT^{U^)]+ E Pr[^]log2Pr[6] 



= C 



(16) 



(17) 



Let Gl be the per-bit entropy of L-bit blocks : 



1 



Gl = -j E Pr[^]log2Pr[6] 



beB^ 



then, 



lim 

L — »^oo L 



[E[fTAU^)]-LxGL 
Shannon proved ([6], theorem 5) that 



= G 



which implies that : 



lim Gl = Hs 

L — >oo 



„„ E[frAU^)] ^ 

L — >oo L 



Let Pr[6, j] be the probability of a binary sequence b followed by the bit j G {0,1} 
and Pr[j|5] = Pr[6, j]/ Pr[6] be the conditional probability of bit j after b. Let, 



Fl = - E Pi'[^> J] Pr[j>] 



(18) 



where the sum is taken over all sequences b of length L — 1 and j G (0, 1}. We 
have : 

El= Y. PrHi?(Pr[l|fe]) 

bGBE-i 

and, by virtue of Shannon’s sixth theorem (op. cit.) : 

1 ^ 

Fl = LxGl-{L-1)Gl-i , Gl = jY^^ 



and 



lim Fl = Hs 

L — »^oo 



lim 

L — »^oo 



F[fTAU^)]~YEi 



= G 



wherefrom 
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5.3 Refuting the entropy conjecture 

Fl is in fact the entropy of the L-th order approximation of S [1,6]. Under 
such an approximation, only the statistics of binary sequences of length L are 
considered. After a sequence b of length L — 1 has been emitted, the probability 
of emitting the bit j G {0, 1} is Pr[j|6]. The L-th order approximation of a source 
is thus a binary stationary source with less than L—1 bits of memory, as defined 
in section 5.1. A source with M bits of memory is then equivalent to its L-th 
order approximation for L > M, and thus Vi > M, Ft = Hs, and : 

M 

lim \E[fTAU^)] M)Hs] = C 

L^oo L ^ ^ J 

For example, considering a BMSp (random binary source which emits ones with 
probability p and zeroes with probability 1 — p and for which M = 0 and Hs = 
H{p)), we get the following result given in [5] : 

hm \E[fT^{U^)]-LH{p)] =C 

The conjecture is nevertheless refuted by considering an STPp which is a random 
binary source where a bit is followed by its complement with probability p. An 
STPp is thus a source with one bit of memory and two equally-probable states 
0 and 1. It follows (12 and 18) that Fi — H{l/2) = 1, Hs = H{p), and : 

hm \E[fTAU^)] -{L- l)Hs - ll = C 

L — »^oo L 

which contradicts Maurer’s (7-years old) entropy conjecture : 
hm \E[fT^{U^)]-LHs] =C 

L — »^oo L 

6 Further research 

Although the universal test is now precisely tuned, a deeper exploration of Maur- 
er’s paradigm still seems in order : for instance, it is possible to design a c(T, K)- 
less test by using a newly-sampled random sequence for each A„(s^) (since in 
this setting the A„(s^) are truly independent, c{L,K) could be replaced by 
one). Note however that this approach increases considerably the total length 
of the random sequence; other theoretically interesting generalizations consist 
in extending the test to non-binary sources or designing tests for comparing 
generators to biased references (non-BSS ones). 
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Abstract. In this paper, we present a simple method for generating 
random-based signatures when random number generators are either un- 
available or of suspected quality (malicious or accidental). 

By opposition to all past state-machine models, we assume that the 
signer is a memoryless automaton that starts from some internal state, 
receives a message, outputs its signature and returns precisely to the 
same initial state; therefore, the new technique formally converts ran- 
domized signatures into deterministic ones. 

Finally, we show how to translate the random oracle concept required in 
security proofs into a realistic set of tamper-resistance assumptions. 

1 Introduction 

Most digital signature algorithms rely on random sources which stability and 
quality crucially influence security: a typical example is El-Gamal’s scheme 0 
where the secret key is protected by the collision-freedom of^e source. 

Although biasing tamper-resistant generators is diflicull| discrete compo- 
nents can be easily short-circuited or replaced by fraudulent emulators. 

Unfortunately, for pure technological reasons, combining a micro-controller 
and a noise generator on the same die is not a trivial engineering exercise and 
most of today’s smart-cards do not have real random number generators (tra- 
ditional substitutes to random sources are keyed state-machines that receive a 
query, output a pseudo-random number, update their internal state and halt 
until the next query: a typical example is the BBS generator presented in Q). 

In this paper, we present an alternative approach that converts randomized 
signature schemes into deterministic ones: in our construction, the signer is a 
memoryless automaton that starts from some internal state, receives a message, 
outputs its signature and returns precisely to the same initial state. 

Being very broad, we will illustrate our approach with Schnorr’s signature 
scheme before extending the idea to other randomized cryptosystems. 

^ such designs are usually buried in the lowest silicon layers and protected by a contin- 
uous scanning for sudden statistical defects, extreme temperatures, unusual voltage 
levels, clock bursts and physical exposure. 
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2 Digital signatures 



In eurocrypt’96, Pointcheval and Stern proved the security of an El-Gamal 
variant where the hash-function has been replaced by a random oracle. However, 
since hash functions are fully specified (non-random) objects, the factual signifi- 
cance of this result was somewhat unclear. The following sections will show how 
to put this concept to work in practice. 

In short, we follow Pointcheval and Stern’s idea of using random oraclej 
but distinguish two fundamental implementations of such oracles (private and 
public), depending on their use. 

Recall, pro memoria, that a digital signature scheme is defined by a distri- 
bution generate over a key-space, a (possibly probabilistic) signature algorithm 
sign depending on a secret key and a verification algorithm verify depending on 
the public key (see Goldwasser et al. |3). 

We also assume that sign has access to a private oracle / (which is a part 
of its private key) while verify has access to the public oracle h that commonly 
formalizes the hash function transforming the signed message into a digest. 

Definition 1. Let = {generate, sign^ , verify^) denote a signature scheme de- 
pending on a uniformly -distributed random oracle h. E is {n,t, e)-secure against 
existential- forgery adaptive-attacks if no probabilistic Turing machine, allowed 
to make up to n queries to h and sign can forge, with probability greater than e 
and within t state-transitions (time), a pair {m,a}, accepted by verify. 

More formally, for any (n, f)-limited probabilistic Turing machine A that 
outputs valid signatures or fails, we have: 



Pr 






succeeds 



< e 



where uj is the random tape. 

Figure ^presents such a bi-oracle variant of Schnorr’s scheme: h is a public 
(common) oracle while / is a secret oracle (looked upon as a part of the sign- 
er’s private key); note that this variant’s verify is strictly identical to Schnorr’s 
original one. 

Definition 2. Let Tt = {hK)x^K '■ A B be a family of hash- functions, 
from a finite set A to a finite set B, where the key K follows a distribution 1C. 
TL is an {n, e) -pseudo-random hash-family if no probabilistic Turing machine A 
can distinguish hx from a random oracle in less than t state-transitions and n 
queries, with an advantage greater than e. 

In other words, we require that for all n-limited A'. 



Pi^ \A^^{w) accepts] — Pr [M^(w) accepts] 



< e 



^ although, as showed recently, there is no guarantee that a provably secure scheme in 
the random oracle model will still be secure in reality 
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System parameters: 


k, security parameter 
p and q primes, q\{p — 1) 
g G Z* of order q 

h-.{0,l}*-rXg 


Key generation: 


generate(l^) 

secret: x Gr Z, and / : {0, 1}* Z, 

public: y = mod p 


Signature generation: 


sign(m) := {e, s} 
u = f{m,p,q,g,y) 
r = g'^ mod p 
e = h{m, r) 
s = u — xe mod q 


Signature verification 


verify(m; e, s) 
r = g’^y'^ mod p 
check that e = h{m, r) 



Fig. 1. A deterministic variant of Schnorr’s scheme. 



where u> is the random tape and h is a random mapping from A to B. 

So far, this criterion has been used in block-cipher design but never in con- 
junction with hash functions. Actually, Luby and Rackoff proved that a truly 
random 3-round, £-bit message Feistel-cipher is (n, n^/2^/^)-pseudo-random and 
safe until n = 2^/"^ messages have been encrypted (this argument was brought 
as an evidence for DES’ security). 

Note that (n, e)-pseudo-randomness was recently shown to be close to the 
notion of n-wise decorrelation bias, investigated by Vaudenay in 

This construction can be adapted to pseudo-random hash-functions as fol- 
lows: we first show how to construct a pseudo-random hash- function from a huge 
random string and then simplify the model by de-randomizing the string and 
shrinking it to what is strictly necessary for providing provable security. Fur- 
ther reduction will still be possible, at the cost of additional pseudo-randomness 
assumptions. 

Theorem 1. Let B be the set of i-bit strings and A = B^ . Let us define two 
B-to-B functions, denoted F and G, from an £ x 2^^^ -bit key K = {F, G}. Let 
hK{x,y) = y ® G{x (B F{y)) . The family {hK)K is {n,n^ /2^^^)-pseudo-random. 

Proof. The considered family is nothing but a truncated two-round Feistel con- 
struction and the proof is adapted from and The core of the proof 

consists in finding a meaningful lower bound for the probability that n different 
{xi, yi}’s produce n given Zi’s. More precisely, the ratio between this probability 
and its value for a truly random function needs to be greater than 1 — e. Letting 
T = x(B F{y), we have: 

Pr[hK{xiyi) = Zi]i = 1, . . . ,n] > Pr[hK{xiyi) = Zi and Ti pairwise different] 
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> 





n(n — 1) 
2 



min Pr[7i 
ij 



Tj] 



and for any i j (since XiUi ^ XjUj), we either have yi ^ yj ^ Pr[Ti = Tj] = 
1/2^, or yi = yj and Xi ^ Xj which implies Pr[Ti = Tj] — 1. 

Consequently: 

Pr[hK{xiyi) = Zi]i=l,...,n]> ^ 




Considering a probabilistic distinguisher using a random tape uj, we get: 

Pr (uj) accepts] = 'V Pr [xiyiZi . . . a;„y„z„] 

lo,K ^ ^ lo,K 

accepting 

• • ■^nyn^n 

= Pi[xiyiZi/xiyi - 2 . Zi] Pr[hK{xiyi) = Zi] 

^ ^ uj K 

XiViZi 

> (1 - e) Pr[xiyiZi/xiyi ^ Zi] Pr[0{xiyi) = Zi] 

^ ^ LJ O 

XiViZi 

= (1 — e) Pr [A^{uj) accepts] 

io,0 

and 

Pr [A^’^{u;) accepts] — Pr[^‘^(w) accepts] > — e 

lo,K i0,O 

which yields an advantage smaller than e by symmetry (i.e. by considering an- 
other distinguisher that accepts if and only if A rejects). □ 

Note that this construction can be improved by replacing F by a random lin- 
ear function: if iC = {a, G} where a is an Gbit string and G an nGbit string defin- 
ing a random polynomial of degree n — 1, we define hxix) = y ® G{x ® a x y) 
where a x y is the product in GF(2^) (this uses Carter- Wegman’s xor-universal 
hash function Q). 

More practically, we can use standard hash-functions such as: 



hK{x) = HMAC-SHA(F, x) 

at the cost of adding the function’s pseudo-randomness hypothesis QQ to the 
(already assumed) hardness of the discrete logarithm problem. 

To adapt random oracle-secure signatures to everyday’s life, we regard (/ix)ic 
as a pseudo-random keyed hash-family and require an indistinguishability be- 
tween elements of this family and random functions. In engineering terms, this 
precisely corresponds to encapsulating the hash function in a tamper-resistant 
device. 



Theorem 2. Let Ti be a {n, ei)-pseudo-random hash-family. If the signature 
scheme is {n,t, C 2 )- secure against adaptive- attacks for existential- forgery, 
where h is a uniformly- distributed random- oracle, then is (n, t, ei-h £ 2 ) -secure 
as well. 
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Proof. Let be a Turing machine capable of forging signatures for hx 

with a probability greater than ei + 62 - hx is distinguished from h by apply- 
ing A and considering whether it succeeds or fails. Since can not forge 

signatures with a probability greater than € 2 , the advantage is greater than ei, 
which contradicts the hypothesis. □ 

3 Implementation 

An interesting corollary of theorem Qis that if n hashings take more than t 
seconds, then K can be chosen randomly by a trusted authority, with some 
temporal validity. In this setting, long-term signatures become very similar to 
time-stamping 

Another consequence is that random oracle security-proofs are no longer 
theoretical arguments with no practical justification as they become, de facto, a 
step towards practical and provably-secure schemes using pseudo-random hash 
families; however, the key has to remain secret, which forces the implementer to 
distinguish two types of oracles: 

— A public random oracle h, that could be implemented as keyed pseudo- 
random hash function protected in a all tamper-resistant devices (signers 
and verifiers). 

— A private random oracle /, which in practice could also be any pseudo- 
random hash-function keyed with a secret (unique to each signature device) 
generated by generate. 

An efficient variant of Schnorr’s scheme, provably-secure in the standard model 
under the tamper-resistance assumption, the existence of one-way functions and 
the DLP’s hardness is depicted in figure^ 

The main motivation behind our design is to provide a memoryless pseudo- 
random generator, making the dynamic information related to the state of the 
generator avoidable. In essence, the advocated methodology is very cheap in 
terms of entropy as one can re-use the already existing key-material for gener- 
ating randomness. 

Surprisingly, the security of realistic random-oracle implementations is en- 
hanced by using intentionally slow devices: 

— use a slow implementation (e.g. 0.1 seconds per query) of a (2^°, 1/2000)- 
pseudo-random hash-family. 

— consider an attacker having access to 1000 such devices during 2 years (= 2^® 
seconds) . 

— consider Schnorr’s scheme, which is (n,t,2^^nt/ToL)secure in the random 
oracle model, where Tdl denotes the inherent complexity of the DLP Q. 

For example, {|p| = 512, j^l = 256}-discrete logarithms can not be computed 
in less than 2®® seconds (= a 10,000-processor machine performing 1,000 modular 
multiplications per processor per second, executing Shank’s baby-step giant-step 
algorithm ^]) and theorem^guarantees that within two years, no attacker can 
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System parameters: 


k, security parameter 
p and q primes, q\{p— 1) 

5 G Z* of order q 

{hv : {0, 1}* — ^ '2.q)y^ic pseudo-random hash-family 
V GrIC secret key 

(same in all tamper-resistant devices) 


Key generation: 


generate(l^) 

secret: x G a 'Zq and z GrJC 
public: y = g^ mod p 


Signature generation: 


sign(m) := {e, s} 
u = h„{m,p,q,g,y) 
r = mod p 
e = hy{m, r) 
s = u — xe mod q 


Signature verification 


verify(m; e, s) 
r = mod p 

check that e = hy (m, r) 



Fig. 2 . A provably-secure deterministic Schnorr variant. 



succeed an existential-forgery under an adaptive-attack with probability greater 
than 1/1000. 

This proves that realistic low-cost implementation and provable security can 
survive in harmony. Should a card be compromised, the overall system security 
will simply become equivalent to Schnorr’s original scheme. 

Finally, we would like to put forward a variant (see figure H which is not 
provably-secure but presents the attractive property of being fully deterministic 
(a given message m, will always yield the same signature): 

Lemma 1. Let {ri, si} and {r2, S2} be two Schnorr signatures, generated by the 
same signer using algorithm^then {ri, Si} = {r2, S2} mi = m2. 

Proof. If mi = m2 = m then r\ = r2 = = r mod p, Ci = 62 = 

h{m,r) = e mod q and si = h{x,m,p,q,g,y) — xe mod q = S2 = s, therefore 

{?”l,Si} = {r2,S2}. 

To prove the converse, observe that if ri = r2 = r then mod p 

meaning that u\ = U2 = u. Furthermore, s\ = u — xei = u — XC2 = S2 mod q 
implies that Ci = h{mi,r) = h{m2,r) = €2 mod q] consequently, unless we 
found a collision, mi = m2. □ 

Industrial motivation: This feature is a cheap protection against direct 
physical attacks on the signer’s noise-generator (corrupting the source to obtain 
twice an identical u). 
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System parameters: 


k, security parameter 

p and q prime numbers such that q\{p — 1) 
(/ G Z* of order q 
h, hash function 


Key generation: 


generate(l^) 
secret: x £ a 'Zq 
public: y = g"' mod p 


Signature generation: 


sign(m) := {e, s} 
u = h{x, m, p, q, g, y) mod q 
r = g'' mod p 
e = h{m, r) mod q 
s = u — xe mod q 


Signature verification 


verify(m; e, s) 

r = g'^y" mod p 

check that e = h{m, r) mod q 



Fig. 3. A practical deterministic Schnorr variant. 



4 Deterministic versions of other schemes 

The idea described in the previous sections can be trivially applied to other 
signature schemes such as or Suffice it to say that one should replace 
each session’s random number by a digest of the keys (secret and public) and 
the signed message. 

Blind signatures Q (a popular building-block of most e-cash schemes) can 
be easily transformed as well: in the usual RSA setting the user computes 
w = h{k, m, e, n) (where fc is a short secret-key) and sends m' = w^m mod n 
to the authority who replies with s' = mod n that the user un-blinds by 

a modular division (s = s' /w = m'^ mod n). 

The “blinding” technique can also be used to prevent timing-attacks 
but it requires again a random blinding factor Q. 

More fundamentally, our technique completely eliminates a well-known at- 
tack on Me Eleice’s cryptosystem where, by asking the sender to re-encrypt 
logarithmically many messages, one can filter-out the error vectors (e, chosen 
randomly by the sender at each encryption) through simple majority votes. 

We refer the reader to section ill. 1.4. A. C of | for more detailed description 
of this attack (that disappears by replacing e by a hash-value of m and the 
receiver’s public-keys) . 
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Abstract. The problem of finite field basis conversion is to convert from 
the representation of a field element in one basis to the representation 
of the element in another basis. This paper presents new algorithms for 
the problem that reqnire much less storage than previous solutions. For 
the finite field GF’(2"*), for example, the storage requirement of the new 
algorithms is only 0{m) bits, compared to 0{m?) for previous solutions. 

With the new algorithms, it is possible to extend an implementation in 
one basis to support other bases with little additional cost, thereby pro- 
viding the desired interoperability in many cryptographic applications. 

1 Introduction 

Finite field arithmetic is becoming increasingly important in today’s computer 
systems, particularly for cryptographic operations. Among the more common 
finite fields in cryptography are odd-characteristic finite fields of degree 1, con- 
ventionally known as GF{jp) arithmetic or arithmetic modulo a prime, and even- 
characteristic finite fields of degree greater than 1, conventionally known as 
GF(2"*) arithmetic, where m is the degree. Arithmetic in GF(2"*) (or any finite 
field of degree greater than 1) can be further classified according to the choice 
of basis for representing elements of the finite field; two common choices are a 
polynomial basis and a normal basis. 

For a variety of reasons, including cost, performance, and compatibility with 
other applications, implementations of GF(2'") arithmetic vary in their choice of 
basis. The variation in choice affects interoperability, since field elements repre- 
sented in one basis cannot be operated on directly in another basis. The problem 
of interoperability limits the applicability of implementations to cryptographic 
communication. As an example, if two parties wish to communicate with cryp- 
tographic operations and each implements finite field arithmetic in a different 
basis, then at least one party must do some conversions, typically before or after 
communicating a field element or at certain points in the cryptographic opera- 
tions. Otherwise, the results of the cryptographic operations will be different. 

It is well known that it is possible to convert between two choices of basis for 
a finite field; the general method involves a matrix multiplication. However, the 
matrix is often too large. For instance, the change-of-basis matrix for GF(2"*) 
arithmetic will have m? entries, requiring several thousand bytes or more of 
storage in typical applications (e.g., m « 160). While such a matrix may be 
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reasonable to store in a software implementation, it is likely to be a significant 
burden in a low-cost hardware implementation. 

We describe in this paper new algorithms for basis conversion for normal 
and polynomial bases that require much less storage than previous solutions. 
Our algorithms are also very efficient in that they involve primarily finite-field 
operations, rather than, for instance, matrix multiplications. This has the advan- 
tage of benefiting from the optimizations that are presumably already available 
for finite- field operations. 

With our algorithms, it is possible to extend an implementation in one basis 
so that it supports other choices of basis, with only a small additional cost in 
terms of circuitry, program size, or storage, relative to typical cryptographic 
applications of finite-field arithmetic, such as elliptic curve cryptosystems. Our 
work applies both to even-characteristic and odd-characteristic finite fields of 
degree greater than one, though even-characteristic arithmetic is the most likely 
application of our work, since it is more common in practice. We also suggest 
how to generalize our algorithms to other choices of basis than polynomial and 
normal bases. 

2 Background 

In this section, we introduce some basic notation and definitions. Let GF{q) 
denote the finite field with q elements where q — p'' for some prime p and integer 
r > 1 . The characteristic of the field is the prime p. For even-characteristic fields, 
we have p = 2 . Throughout the paper, we use GF{q"^) to denote the finite field 
defined over the ground field GF{q); the degree of GF{q^) over GF{q) is m. 

A basis for the finite field GF{q^) is a set of m elements wq, • ■ ., G 

GF{q"^) such that every element of the finite field can be represented uniquely 
as a linear combination of basis elements. That is, given an element e G GF{q^), 
we can write 

m— 1 

e = ^ B[i]ui 

where i?[0 ], . . . , i?[m — 1] G GF{q) are the coefficients. The row vector B = 
(i?[0 ],..., B[m — 1]) is called the representation of the element e in the ba- 
sis ujQ, . . . Once the basis is chosen, rules for field operations (such as 

addition, multiplication, inversion) can be derived. 

Elements of a finite field can be represented in a variety of ways, depending 
on the choice of basis for the representation. Two common choices of basis are a 
polynomial basis and a normal basis. In a polynomial basis, the basis elements 
are successive powers of an element 7 (called the generator), that is, uii =7*. The 
element 7 must satisfy certain properties, namely that the powers 7°, . . . , 7"*“! 
are linearly independent. In a normal basis, the basis elements are successive 
exponentiations of an element 7 (again called the generator), that is, = 7"^ . 
In this case, the successive exponentiations must be linearly independent. Each 
basis has its own advantages and disadvantages in terms of implementation, and 
some discussions can be found in 
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The basis conversion or change- of -basis problem is to compute the represen- 
tation of an element of a finite field in one basis, given its representation in 
another basis. The general solution to the problem is to apply the change-of- 
basis matrix relating the two bases. Suppose that we are converting from the 
representation i? of e in the basis ujq, . . . , uJm-i to another basis. Let Wi be the 
representation of the element uJi in the second basis, and let M, the change-of- 
basis matrix, be the mx m matrix whose columns are Wq, . . . , Wm-i- It follows 
that the representation A of the element e in the second basis can be computed 
as the matrix- vector product = M where we view A and B as row vectors 

of dimension m. A change-of-basis matrix is invertible, and we can convert in 
the reverse direction by computing B^ — . 

The change-of-basis-matrix solution is straightforward and effective. But it is 
limited by two factors. First, the matrix M is potentially quite large, consisting 
of m? coefficients. Moreover, if we wish to convert in both directions, we must 
store the matrix M~^ as well, or else compute it, which could be time-consuming. 
Second, the operations involved in computing the matrix- vector product, while 
involving coefficients in the ground field, are not necessarily implementable with 
operations in either basis. Thus, the conversion process may not be as efficient 
as we would like. 

Another approach to conversion is to multiply by elements of a dual basis 
(see Page 58 of |), but the storage requirement will again be quite large, if the 
entire dual basis is stored. 

Our objective is to overcome the difficulties of the approaches just described. 
We wish to convert from one basis to another without involving a large amount 
of storage or requiring a large number of operations. And, we would like to take 
advantage of the built-in efficiency of finite field operations in one basis, rather 
than implementing new operations, such as matrix multiplications. We will call 
the basis in which finite field operations are primarily performed the internal 
basis. The other basis will be called the external basis. The conversion operation 
from the external basis to the internal basis will be called an import operation, 
and the reverse an export operation. 

The specific problems to be solved are thus as follows. 

— Import problem. Given an internal basis and an external basis for a finite 
field GF{q"^) and the representation B of a field element in the external basis 
(the external representation), determine the corresponding representation A 
of the same field element in the internal basis (the internal representation) 
primarily with internal-basis operations, and with minimal storage. 

— Export problem. Given an internal basis and an external basis for a finite field 
GF{q^) and the internal representation A of a field element, determine the 
corresponding external representation B of the same field element primarily 
with internal-basis operations, and with minimal storage. 

The more general problem of converting from one basis to another with 
operations in a third basis is readily solved by importing to and re-exporting 
from the third basis; thus, our algorithms for converting to and from an internal 
basis will suffice for the more general problem. 
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3 Conversion Algorithms 

In this section, we present four conversion algorithms for the import and export 
problem. We will focus our discussion on the case that both bases are defined 
over the same ground field GF{q), and that the coefficients in the ground field 
are represented the same way in both bases. Section 4 addresses the case that 
the bases are defined over different ground fields, or that the coefficients are 
represented differently. 

We require that the external basis is a polynomial basis or a normal basis, 
so that elements in the external basis have either the form 

m— 1 

( 1 ) 

i=0 



or the form 

m— 1 

(2) 

i=0 

where 7 is the generator of the external basis and B[0], . . . , B[m—1] G GF{q) are 
the coefficients of the external representation. However, as discussed in Section 
4, similar algorithms may be constructed for other choices of basis. In addition, 
we assume that the internal representation G of the generator 7 is given, which 
is reasonable in most practical settings! 

We make no assumptions on the internal basis, other than that it is defined 
over the same ground field GF{q) and that the representation of the coefficients 
in the ground field is the same. Our algorithms involve the same sequence of 
operations whether the internal basis is a polynomial basis, a normal basis, or 
some other type of basis. Thus, as examples, our algorithms can convert from a 
polynomial basis to a normal basis, from a normal basis to a polynomial basis, 
from a polynomial basis with one generator to a polynomial basis with another 
generator, or from a normal basis with one generator to a normal basis to another 
generator, to give a few possibilities. 

We assume that addition, subtraction, and multiplication operations are 
readily available in the internal basis. A special case of multiplication, which 
can be more efficient, is scalar multiplication, where one of the operands is a 
coefficient, i.e., an element of the ground field. 

In the following, we denote by I the internal representation of the identity 
element. 



^ If not, it can be computed given information about the internal and external bases, 
such as the minimal polynomial of the generator. There may be several acceptable 
internal representations of the generator, and hence several equivalent internal rep- 
resentations of a given element. For interoperability we need only that conversion 
into and out of the internal basis involve the same choice of G. 
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3.1 Basic Techniques for Conversion 

Before presenting our conversion algorithms, we first describe some useful tech- 
niques which will serve as the building blocks of the conversion algorithms. 

Algorithms for importing from an external basis can be constructed based on 
a direct computation of Equations (1) or (2). Since the internal representation 
G of the generator 7 is given, we can easily convert each external basis element 
into its internal representation using only operations in the internal basis. In 
Sections 3.2 and 3.3, we give alternatives to these algorithms which are amenable 
to further performance improvements. 

Algorithms for exporting to an external basis, however, cannot be constructed 
in the same direct manner. The major obstacle comes from the following fact: 
It is not obvious how to convert each internal basis element into its external 
representation using only operations in the internal basis, even given the external 
representation of the generator. So instead of converting the basis element, we 
will use some new techniques, and they are described in the following three 
lemmas. 

Lemma 1. Suppose the external basis is a polynomial basis with a generator 7. 
Let B be the external representation of an element e, and let B' be the external 
representation of the element £7“^. Then for all indexes 0 < f < m — 1, 

B'[i] = B[i+l] 



provided that i?[0] = 0. 

Lemma 3.1 shows that if the external basis is a polynomial basis, then mul- 
tiplication by the inverse 7“^ of the generator 7 shifts the coefficients down, 
provided that the coefficient at index 0 is initially 0. The result leads to an algo- 
rithm for exporting to a polynomial basis: compute the coefficient i?[0], subtract 
i?[0], multiply by G~^ , and repeat, computing successive coefficients of B. 

Related to this, multiplication by the generator 7 shifts coefficients up, pro- 
vided that i?[m — 1] = 0. Rotation of the coefficients in either direction is also 
possible, though we will not need it for our algorithms. 

Lemma 2. Suppose the external basis is a normal basis. Let B be the external 
representation of an element e, and let B' be the external representation of the 
element e‘^ . Then for all indexes 0 < i < m — 1, 

B'[i] = B[{i — 1) mod m]. 

Lemma 3.2 shows that if the external basis is a normal basis, then raising 
to the power q rotates the coefficients up. The result leads to an algorithm for 
exporting to a normal basis: compute the coefficient B[m— 1], raise to the power 
q, and repeat. 

We still need a technique for obtaining the coefficient R[0] or R[m — 1]. From 
the fact that the coefficients of the internal and external representations are 
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related by a change-of-basis matrix M = M we know that a coefficient 
B[i] can be obtained by a linear combination 

m— 1 

B[i\ = M-^[i,j]A[j] 

3=0 

where the values G GF{q) are elements of the matrix M~^. We can 

thus obtain a coefficient B[i] by operations over the ground field GF{q). We may 
also compute the coefficient B\i] with only internal-basis operations, as will be 
shown in the next lemma. 

In preparation, we will need to consider the multiplication matrices for the 
internal basis. Let ojq, ■ ■ ■ , ojm-i be the internal basis. The multiplication matrix 
for the coefficient at index k, denoted Kk, is the m x m matrix whose [i,j]th 
element, 0 < < m, is the coefficient at index k of the representation in the 

internal basis of the product ujiojj. In other words, the matrices are defined so 
that for all i, j, 0 < i, j < m. 



m— 1 

Kk[i,j]uJk- 

k=0 

The multiplication matrices is invertible. It follows from this definition that the 
coefficient at index 0 of the product of two internal representations R and S, 
which we may write as (i? x S')[0], is equal to RKqS^ . 

Lemma 3. Let sq, ■ ■ ■ , Sm-i be elements of GF{q), and let Kq be the multipli- 
cation matrix for the coefficient at index 0 in the internal basis. Then for any 
internal representation A of an element, 

m— 1 

Y = (^ X '^)[0]> 

where V is defined as • • • Sm-iY' ■ 

Proof. Since the multiplication matrix Kq is invertible, the element V exists. By 
definition of multiplication, we have {A x V)[0] = AKqV'^ . It follows directly 
that {A X V)[0] equals the desired linear function. 

Lemma 3.3 shows that any linear combination of coefficients of the internal 
representation of an element may be computed with internal-basis operations. 
To generalize the result, we denote by Vi the value such that 

B[i\ = {AxVf)[% 

i.e., the one where the values sq, . . . , Sm-i are the matrix row [z, 0], . . . , 
M“^[z, m — 1]. Like the internal representation G of the generator of the external 
basis, a value Vt is particular to an external basis; a different set of values Vt 
would be needed for each external basis with which one might want to convert. 
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In what follows, we present four conversion algorithms, for importing and 
exporting with external polynomial and normal bases. For each algorithm, we 
measure the number of full multiplications and scalar multiplications involved 
and the amount of storage required for constants and intermediate results. As 
noted previously, there are direct import algorithms based on direct compu- 
tation of Equations (1) and (2); we give different versions that allow further 
performance improvements. 

As we will see, each basic conversion algorithm requires the storage of only 
one or two constants (vectors of length m). So for GF{q"^) the total storage 
requirement is 0{mlogq) bits, compared to 0{m? log q) bits for previous solu- 
tions. 



3.2 Importing from a Polynomial Basis 

Algorithm ImportPoly converts from a polynomial-basis representation for 
GF{q^) over GF{q) to an internal representation over the same ground field, 
primarily with internal-basis operations. 

Input: B[0], . . . , B[m — 1], the external representation to be converted 
Output: A, the corresponding internal representation 

Constants: G, the internal representation of the generator of the external basis 

proc ImportPoly 
A^O 

for i <— m — I down to 0 do 

A^ AxG 
A^ A + B[i\ X I 

endfor 

The algorithm processes one coefficient per iteration, scanning from highest 
index to lowest, accumulating powers of G up to G® for each B[i] term. It involves 
m full multiplications (and may also involve m scalar multiplications, depending 
on the form of /), and requires storage for one constant. 

We can view the above algorithm as computing the internal representation 
A according to the following formula. 

A = B[0] X I + Gx {B[l] X I + Gx {B[2] x I+--- + Gx {B[m -l]x !)■■■)). 

So ImportPoly bears some similarity to the method of evaluating a polynomial 
f{x) at a given value x* using Horner’s rule Q. More specifically, f{x*) can be 
evaluated with 0(m) operations by rewriting f{x) as follows. 

f{x*) = ao x*{ai + x*{a 2 H h x*{am-i)) ■ ■ •))■ 

There are some distinctions between the two methods. The inputs to basis con- 
version are the coefficients i?[0], . . . , B[m — 1], and the generator G is fixed. In 
contrast, the input to polynomial evaluation is the value x*, and the coefficients 
ao, . . . , ttm-i are fixed. 
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It is possible to reduce the number of iterations of the loop and thereby 
improve performance by processing more than one coefficient per iteration. One 
of the ways to do this is, in the case that m is even, to change the loop to 

for i ^ m/2 — f down to 0 do 

AxG 

A^ A + B[i + m/2] X + B[i] x / 

endfor 

where G'"/^ is an additional constant. Although the number of scalar multipli- 
cations remains the same, the number of full multiplications can be reduced by 
up to a factor of k, if k coefficients are processed per iteration. In addition, all 
k coefficients can be processed in parallel. 

Another improvement is to unroll the first iteration and start with A <— 
B[m — f] X /. 



3.3 Importing from a Normal Basis 

Algorithm ImportNormal converts from a normal-basis representation for 
GF{q^) over GF{q) to an internal representation over the same ground field, 
primarily with internal-basis operations. 

Input: B[0], . . . , B[m — f], the external representation to be converted 

Output: A, the corresponding internal representation 

Constants: G, the internal representation of the generator of the external basis 

proc ImportNormal 

A^O 

for i ^ m — 1 down to 0 do 

A^ A + B[i\ X G 

endfor 

The algorithm processes one coefficient per iteration, scanning from highest 
index to lowest, accumulating powers of G up to G^ for each B[i] term. (We 
make use of the fact that {A + B\i] x GY = A“^ + B[i] x G^). It involves m 
exponentiations to the power q and m scalar multiplications, and requires storage 
for one constant, in addition to the intermediate results for exponentiation. (The 
exponentiation will typically involve about f.51og2 q multiplications and require 
storage for one intermediate result, though better performance is possible if the 
internal basis is a normal basis. For q = 2, the exponentiation will be just a 
squaring.) 

As with ImportPoly, it is possible to improve performance by unrolling the 
first iteration or by processing more than one coefficient per iteration. 




Storage-Efficient Finite Field Basis Conversion 



89 



3.4 Exporting to a Polynomial Basis 

Algorithm ExportPoly converts from an internal representation for GF{q^) 
over GF{q) to an external polynomial-basis representation over the same ground 
field, primarily with internal-basis operations. 

Input: A, the internal representation to be converted 

Output: B[0], . . . , B[m — 1], the corresponding external representation 

Constants: 

G~^, the internal representation of the inverse of the generator of the external 
basis 

Vo, the value such that (A x 1^)[0] = i?[0] (see Lemma 3.3) 

proc ExportPoly 

A ^ Ax Vo 

for i <— 0 to m — 1 do 

B[i] ^ A[0] 

A ^ A - B[i\ X Eo 
A^ AxG-^ 

endfor 

The algorithm computes one coefficient per iteration, applying the observa- 
tions previously given, with the additional enhancement of premultiplying by 
the value VbB The algorithm involves m + 1 full multiplications and m scalar 
multiplications, and requires storage for two constants. The input A is modified 
by the algorithm. 

As with the previous algorithms, more than one coefficient can be processed 
per iteration. An additional constant such as V^/i/Vo is required, where V^ji 
is the value such that {A x h^/2)[0] = B[m/2]-, the coefficient B[i + m/2] would 
be computed as 



T^AxV^I^IVo 

B\iFml2\ T[0] 

The number of multiplications is not reduced in this case, due to the method 
for computing the coefficient i?[i-|-m/2]. The improvement would be more signif- 
icant if the coefficients were computed as a direct linear function of A, provided 
that such computation were efficient. 

Another improvement is to unroll the last iteration and end with i?[m — 1] = 

A[0j. 

^ This is the reason that the correction step involves subtracting the value B[i] x Vo 
rather than B[i] x I. The alternative to premultiplying A by Vb is to multiply it by 
Vo during each iteration before computing the coefficient B[i]-, but this involves an 
additional multiplication per iteration. 
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3.5 Exporting to a Normal Basis 

Algorithm ExportNormal converts from an internal representation for GF{q^) 
over GF{q) to a normal-basis representation over the same ground field, primar- 
ily with internal-basis operations. 

Input: A, the internal representation to be converted 

Output: B[0], . . . , B[m — 1], the corresponding external representation 

Constants: Em-i, the value such that if {A'xVm-i)[C\ = B[m—1] (see Lemma 3.3) 

proc ExportNormal 

for z ^ m — 1 down to 0 do 

Ax Kn_i 

B[z] ^ T[0] 

endfor 

The algorithm computes one coefficient per iteration, applying the observa- 
tions previously given. The algorithm involves m exponentiations to the power 
q and m full multiplications, and requires storage for one constant and one in- 
termediate result, T, in addition to the intermediate results for exponentiation. 
The input A, though modified by the algorithm, returns to its initial value. 

As with ExportPoly, it is possible to improve performance by unrolling 
the last iteration or by processing more than one coefficient per iteration. 

4 Extensions 

The algorithms presented so far all assume that the ground field is the same for 
the internal and the external basis and have the same representation. 

If the ground fields are the same but have different representations, the indi- 
vidual coefficients can be converted through techniques similar to those for the 
entire representation. 

If the ground fields have different representations, however, we can convert 
individual “subcoefficients” of each coefficient, where the subcoefficients are el- 
ements of GF{p). In the import algorithms, we would add terms of the form 
i?[z][j] X Hj (or i?[z][j] X GHj) to A, where B[z][j] is a subcoefficient and Hj is 
the internal representation of an element of the ground-field basis. In the export 
algorithms, we would multiply by values like Vqj where {A x Voj) = 

The storage requirements for these methods would depend on the degree of the 
ground field over GF{p), and would be modest in many cases. More storage- 
efficient algorithms are possible, however, involving techniques different than 
those described here. 

The algorithms presented here are examples of a more general class of con- 
version algorithms, where successive coefficients are processed as an internal 
representation is “shifted” or “rotated” in terms of its corresponding external- 
basis representation. For the algorithms here, such “external” shifting or rota- 
tion is accomplished by such operations as exponentiation, multiplication by G, 
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or multiplication by G~^ (combined with subtraction of the i?[0] coefficient), 
depending on the algorithm. The import algorithms interleave shifting with in- 
sertion of coefficients into the internal representation, by addition of a the term 
B[i] X I or B[i] x G; the export algorithms interleave shifting with extraction of 
the coefficients, by a computation of the form {A x Vb)[0]- 

Algorithms of the same class may be constructed for any choices of basis for 
which an efficient external shifting or rotation operation can be defined. 

5 Applications 

Many public-key cryptosystems are based on operations in large finite mathe- 
matical groups, and the security of these cryptosystems relies on the computa- 
tional intractability of computing discrete logarithms in the underlying groups. 
The group operations usually consist of arithmetic in finite fields, in particular 
GF{p) and GF(2'"). In this section, we focus on the application of our conver- 
sion algorithms to elliptic curve cryptosystems over GF{2^) p. The general 
principles extend to other applications as well. 

At a high level, an elliptic curve over GF(2'") is a set of points which form 
a group with respect to certain rules for adding two points. Such an addition 
consists of a series of field operations in GF(2'"). In a generic implementation 
using projective coordinates, adding two distinct points needs 10 multiplications 
and 3 squarings, and doubling a point needs 5 multiplications and 5 squarings. | 
Elliptic curve cryptosystems over GF(2"*) that are of particular interest to- 
day are analogs of conventional discrete logarithm cryptosystems. Such elliptic 
curve schemes (e.g., EC Diffie-Hellman and EC DSA usually involve one 

or two elliptic curve scalar multiplications of the form Q = kP where P and 
Q are points and k is an integer of length about m. Again in a generic imple- 
mentation, such an operation can be done with m doublings of a point and m/2 
additions of two distinct points, giving a total of (3 x m/2 -1- 5 x m) = 6.5m field 
squarings and (10 x m/2 + 5 x m) = 10m field multiplications. 

To illustrate the cost of conversion, we consider a general situation in which 
two parties implement some elliptic curve scheme over GF{2'^) with different 
choices of basis. In such a situation, each elliptic curve scalar multiplication 
in the scheme would require at most two conversions by one of the parties, 
one before and one after the operation^ The following table compares the cost 
of two conversions (back and forth) with the cost of one elliptic curve scalar 
multiplication. 

^ In general, the number of operations depends on the particular formulas and con- 
straints on the parameters of the curve. The number given here is based on Annex 
A “Number-theoretic algorithms” in 

For example, suppose Alice has a polynomial basis and Bob a normal basis, and 
they want to agree on a shared key using EC Diffie-Hellman. After obtaining Bob’s 
public key P, Alice would import P from the normal basis to the polynomial basis, 
compute Q = kP in the polynomial basis, and export Q from the polynomial basis 
back to the normal basis. Alternatively, Bob could perform the conversions. 
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operation 


multiplications 


squarings 


ImportPoly -|- ExportPoly 


2m -1- 1 


0 


ImportNormal -|- ExportNormal 


m 


2m 


EC scalar multiplication 


lOm 


6.5m 



Based on the above table, we can estimate the extra cost of conversion com- 
pared with one elliptic curve scalar multiplication. When the external basis is 
a polynomial basis, the extra cost (ImportPoly -|- ExportPoly) is around 
2/(10 -I- 6.5 X 0.5) = 15% for an internal polynomial basis (assuming squar- 
ing is twice as fast as multiplication for an internal polynomial basis) and 
about 2/10 = 20% for an internal normal basis (since squarings are essen- 
tially free in an internal normal basis). Similarly, when the external basis is 
a normal basis, the extra cost (ImportNormal -|- ExPORTNoRMAL)is about 
(1 -I- 2 X 0.5)/(10j- 6.5 X 0.5) = 15% for an internal polynomial basis and about 
1/10 = 10% for an internal normal basis. To summarize, the extra cost of con- 
version is between 10% and 20%, depending on the implementation. 

Overall, we conclude that our conversion algorithms incur only a small extra 
cost in an elliptic curve cryptosystem, and that the memory requirement is 
quite reasonable: only a few additional constants or intermediate values need 
to be stored. The cost can be reduced still further by additional optimizations 
such as processing more than one coefficient at a time, with the only additional 
requirement being the storage of a small number of additional elements. 

6 Conclusions 

We have described in this paper several new algorithms for basis conversion 
that require much less storage than previous solutions. Our algorithms are also 
very efficient in that they involve primarily finite-field operations. This has the 
advantage of benefiting from the optimizations that are presumably already 
available for finite-field operations. 

As examples, our algorithms can convert from a polynomial basis to a nor- 
mal basis and from a normal basis to a polynomial basis. Related algorithms 
can convert to and from other choices of basis for which an efficient “external 
shifting” operation can be defined. Our algorithms are particularly applicable to 
even-characteristic finite fields, which are typical in cryptography. 

The variation in choice of basis for representing finite fields has affected in- 
teroperability, especially of cryptosystems. With our algorithms, it is possible 
to extend an implementation in one basis so that it supports other choices of 
basis at only a small additional cost, thereby providing the desired interoperabil- 
ity and extending the set of parties that can communicate with cryptographic 
operations. 





Storage-Efficient Finite Field Basis Conversion 



93 



Acknowledgment 

We would like to thank the IEEE P1363 working group for motivating this 

problem, Leonid Reyzin for helpful discussions, and the anonymous referees for 

their comments. We would also like to thank Rob Lambert for bringing Horner’s 

Rule to our attention. 

References 

1. ANSI X9.62: The Elliptic Curve Digital Signature Algorithm (ECDSA), draft, 
November 1997. 

2. T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. The 
MIT Press, 1990. 

3. IEEE P1363: Standard Specifications for Public-Key Cryptography, Draft version 3, 
May 1998. http://stdsbbs.ieee.org/groups/1363/draft.html. 

4. A. Menezes, I. Blake, X. Gao, R. Mullin, S. Vanstone, and T. Yaghoobian. Appli- 
cations of Finite Fields. Kluwer Academic Publishers, 1993. 

5. A. Menezes. Elliptic Curve Public Key Cryptosystems. Kluwer Academic Publishers, 
1993. 

6. R. Lidl and H. Niederreiter. Finite Fields, volume 20 of Encyclopedia of Mathematics 
and Its Applications. Addison- Wesley, 1983. 




Verifiable Partial Sharing of Integer Factors 



Wenbo Mao 

Hewlett-Packard Laboratories 

Filton Road, Stoke Gifford, Bristol BS34 8QZ, United Kingdom 
wmShplb . hpl . hp . com 



Abstract. It is not known to date how to partially share the factors of 
an integer (e.g., an RSA modulus) with verifiability. We construct such a 
scheme on exploitation of a significantly lowered complexity for factoring 
n = pq using a non-trivial factor of 



1 Introduction 

Partial key escrow purports to add a great deal of difficulty to mass privacy 
intrusion which is possible in ordinary key escrow with abusive authorities while 
preserving the property of an ordinary escrowed cryptosystem in targeting small 
number of criminals. In partial key escrow, a portion of an individual’s private 
key with an agreed and proved size will not be in escrow. Key recovery requires 
a non-trivial effort to determine the missing part. A partial key escrow scheme 
must render that the determination of the missing key part will only be possible 
after recovery of the key part which is in escrow (usually with a set of distributed 
agents who are collectively trusted to share the escrowed key part). If the miss- 
ing part can be determined before, or without taking, a prescribed key recovery 
procedure, then off-line pre-computations can be employed for finding the miss- 
ing part and this can be done in a massive scale with many or all users targeted. 
This constitutes a so-called prematured key recovery attack: the missing key 
part is not really missing and the whole private key of each user can be made 
available right after recovery of the escrowed key part. The effect of partial key 
escrow is thereby nullified and the scenario of mass privacy intrusion can still be 
assumed just as the case of an ordinary key escrow scheme. In their recent work 
“Verifiable partial key escrow”, Bellare and Goldwasser Q discussed scenarios 
of prematured key recovery attacks. 

Thus, a necessary step in verifiable partial key escrow is for a key owner to 
prove that a private key contains a hidden number which will not be in escrow 
and has an agreed size. To discover this number requires first to recover the 
escrowed part of the private key, and only after that recovery can an exhaustive 
search procedure be lunched to determine the missing number. The cost of the 
search will be a well-understood problem given the proved size of the missing 
number. 

The previous verifiable partial key escrow scheme of Bellare and Goldwasser 
B was proposed for discrete logarithm based cryptosystems. In that realization, 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 94-^^^ 1999. 

© Springer-Verlag Berlin Heidelberg 1999 



Verifiable Partial Sharing of Integer Factors 95 

a key y = is cut into two parts j/ij /2 = 9^^ g^^ = gXi+x 2 -v^rhere a;i is a partial 
private key and is proved in escrow, X 2 is the remaining private component which 
will not be in escrow but has an agreed bit size to be proved by the key owner 
during the key escrow time. Using a bit commitment scheme these proofs can 
be done without revealing the partial private and public keys. Key recovery will 
have to be via recovering yi then 2/2 = y/yi followed by searching x^ from 2 / 2 - 
Note that neither of the partial public keys should be revealed before recovering 
x\, or else searching X 2 can take place before x\ is recovered. 

It is not known to date how to partially escrow integer factors (e.g., prime 
factors of an RSA modulus) with verifiability. Full key escrow for integer fac- 
toring based cryptosystems can take the approach of escrowing a prime factor 
of an integer (the scheme in achieves public verifiability). It is however 
not straightforward to perceive a partial integer-factor sharing scheme along 
that approach. A major problem is to establish a precise cost for key recov- 
ery (e.g., a 2'^°-level time cost, which is non-trivial but expensively manageable 
by a well resourced agent, and has been chosen as a suitable workload for the 
partial key escrow sAeme for discrete-log based cryptosystems ^.) Chopping 
down an 80-bit blocljfrom a prime factor and throwing it away will unlikely be 
a correct rendering because, in integer factoring based cryptosystems, a prime 
factor should have a size significantly larger than 80 bits, and so the remain- 
ing escrowed part of the factor will be sufficiently large to allow exploitation of 
polynomial-time factoring methods (e.g.. Coppersmith Q) to factor the integer 
in question. On the other hand, throwing away a much larger number block may 
render a key unrecoverable. 

We will construct a scheme for verifiable partial sharing of the factors of 
an integer. The central idea in our realization is an observation (one of the 
cryptanalysis results in ^3) on a significantly lowered and precisely measurable 
time complexity for factoring n = pq if a, factor of <t>{n) (the Euler phi function) 
is known. This factor will be proved to have an agreed size and will be in escrow 
in verifiable secret sharing. We will reason that without recovery of this factor, 
integer factoring will be as the same difficult as the original factoring problem. 

In the remainder of the paper. Section 2 describes a lowered complexity 
for factoring n with a known factor of 4>{n), Section 3 constructs the proposed 
scheme; Section 4 analyzes the security and performance of the scheme; finally. 
Section 5 concludes the work. 



2 Integer Factoring with an Additional Knowledge 

Let n = pq for p and q being distinct primes. Then 

n+l = {p-l){q-l) + {p+q). (1) 



^ Searching a number of this size requires 2'*° operations using known best algorithms 
based on a square-root reduction; we will discuss this reduction in Section 2. 
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Let r be a factor of (p— 1)((7 — 1), but not a factor oip—1 and q — 1 (otherwise 
will be a factor of (p — l){q — 1)). Then 

p + q = n + I (modr). ( 2 ) 

When r is smaller than p + q, congruence (2) means the equation 

p + q = (n + 1 mod r) + kr, (3) 

for an unknown k. If p + q becomes known, factoring n into p and q follows a 
simple calculation. Equation (3) shows that if r is known, finding the quantity 
p + q is equivalent to finding the unknown k, and hence the difficulty to factor 
n is equivalent to that to find k. 

Let |a| denote the bit size of the integer a in the binary representation. For 
\p+ q\ > |r|, noting (n+ 1 mod r) < r, we have 

|fc| + |r| - 1 < \p + q\ < |fc| + |r| + 1, 

or 

\k\ « \p+q\ - |r|. 

So when r is known, the time needed to determine p + q, or to factor n, is 
bounded above by performing 

2\p+q\-\r\ ^4^ 

computations. The algorithm is to search k in equation (3). 

The bound in (4) can further be lowered significantly. Note that most ele- 
ments in the multiplicative group of integers modulo n Z* have orders larger 
than p + q unless factoring n is easy. This means that arbitrarily picking u G Z*, 
and letting v be the least number satisfying 

u" mod n = 1, 

then in an overwhelming probability, 

V > p + q. 

Combining (1) and (3), we will have 

n+l = {p— l)(g — 1) -I- (n -I- 1 mod r) -|- kr. (5) 

Raising u to the both sides of (5), writing w = mod n, and noting = 

1 (mod n), we have 

y^n+l-(u+lvaodr) (6) 

We point out that the quantity k in (6) is exactly that in (3) because kr < 
p+q which should be smaller than the order of u (namely, transforming (3) to 
(6), the number k will not be reduced in modulo the order of u). With r (hence 
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w) known, extracting k from the equation (6) using Shank’s baby-step giant-step 
algorithm (see e.g., Q, Section 5.4.1) will only need about 



operations (multiplication in Z*) and the same order in the space complexity. 
This is a much lowered bound from (4) as it is the positive square root of (4). 
To this end we have proven the following statement. 

Lemma 1. Let n = pq for p, q being two distinct primes, and r he a known 
factor of 4>{n) = {p — 1)((? — 1) with |r| < \p + q\. Then time and space for 
factoring n can use as an upper bound. □ 

A number of previous integer factoring based cryptosystems make use of a 
disclosed sizable factor of <f{n) where n is the product two secret primes 
Each of these systems includes moduli settings that allow feasible factorization 
using our method. For instance, a modulus setting in Q satisfies = 35. 

Remark Lemma 1 states an upper bound, which means to factor n = pq 

I p4"9 I — 1^1 

with a known r\{p— 1)((7 — 1) needs at most 2 2 operations. However, this 

complexity measurement reaches the lowest known to date on exploitation of 
computing small discrete logarithms. Any algorithm using fewer operations will 
provide a breakthrough improvement on solving the (small) discrete logarithm 
problem. The same argument applies to the verifiable partial key escrow scheme 
of Bellare and Goldwasser. 

If r is unknown, then equation (3), hence equation (6), are not usable. For 
n being the product of large primes p and q, factorization is so far known to be 
an infeasible problem. The verifiable partial integer-factor sharing scheme to be 
specified in the next section will make use of this big time-complexity difference 
between factoring n with a sizeable factor of (p — l)(q — 1), and doing it without. 



3 The Proposed Scheme 

A user (Alice) shall construct her public key n = pq satisfying that p and q are 
primes of the same size, {p — !)((? — 1) has a factor r with |p -|- g| — |r| = 80 and 
r is not a factor of both p — 1 and q — 1. 

Alice shall then prove in zero-knowledge the following things: (i) n is the 
product of two hidden prime factors p and q of the same size; (ii) a hidden number 
r is a factor of (p — l)(q — 1) but not that of both p — 1 and g — 1, satisfying 
— |r| = 80; (iii) verifiable secret sharing of r with a set of agents (called 
shareholders). After all of these have been done, we know from the previous 
section that if r is recovered by co-operative shareholders, n can be factored 
by finding k in (6) with 2^*^ operations using the known fastest algorithms in 
computing small discrete logarithms. On the other hand, if r is not available, n 
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will be in the same position as an ordinary RSA modulus, and given the size of 
n, no computationally feasible algorithm is known to factor it. 

To this end we have fully described the working principle of a verifiable partial 
integer-factor sharing scheme. In the remaining of this section we shall construct 
computational zero-knowledge protocols for proof of the required structure of 
n, and for verifiable secret sharing of r. These constructions will use various 
previous results as building blocks, which can be found, respectively, in the work 
of Chaum and Pedersen Q (for proof of discrete logarithm equality), Damgard Q] 
(for proof of integer size), Pedersen (for verifiable threshold secret sharing), 
and Mao (for proof of correct integer arithmetic) . 

These protocols will make use of a public cyclic group with the following 
construction. Let P be a large prime such that Q = {P — \) /2 be also prime. 
Let / € Zp be a fixed element of order Q. The public group is the multiplica- 
tive group generated by /. We assume that it is computationally infeasible to 
compute discrete logarithms to the base /. Once setup, the numbers / and P 
(hence Q = {P — l)/2) will be announced for use by the system wide entities. 



3.1 Proof of Correct Integer Arithmetic 

We shall apply y = mod P as the one-way function needed to prove the 
correct integer arithmetic. (In the sequel we will omit the presentation of modulo 
operations whenever the omission does not cause confusion.) 

For integers a and b, Alice can use the above one-way function to prove c = ab 
without revealing a, b and c. She shall commit to the values a, b and c by sending 
to a verifier (Bob) their one-way images (A,B,C) = (/“,/^,/“^) and prove to 
him that the pre-image of C is the product of those of A and B. 

However note that for the one-way function used, the proved multiplica- 
tion relationship depicted above is in terms of modulo ord{f) (here ord{f) = Q is 
the order of the element /), and in general this relationship does not demonstrate 
that logj(C) = logj(A) logj(P) is also true in the space of integers. Neverthe- 
less, if Alice can show the bit sizes of the respective discrete logarithms (i.e., 
|logj(A)|, |logj(i?)| and |logj(C)|), then the following lemma guarantees the 
correct multiplication. 

Lemma 2. Let ab = c(modQ) o,nd |c| -I- 2 < \Q\. If |a| -I- |5| < |c| -|- 1 then 
ab = c. 

Proof. Suppose ab ^ c. Then ab = c + £Q for some integer i ^ 0. Noticing 
0 < c < Q, so 



|a| + |6| > |a6| = \c + £Q\ > |Q| - 1 > |c| -I- 1, 

contradicting the condition |a|-|-|6|<|c|-|-l. □ 

Thus y = f^ forms a suitable one-way function to be used for proof of the 
correct product of integers in its pre-image space, provided the bit sizes of the 
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pre-images are shown. Given /^, there exists efficient protocols to show |a;| with- 
out revealing x (Q). In the next subsection we will specify a simplified variation 
of the proof which protects x up to computational zero-knowledge. Applying the 
bit-size proof protocol, we can specify a protocol (predicate) Product{A, B, C) 
below. The predicate will return 1 (Bob accepts) if logj(C) = logj(A) logj(i3), 
or return 0 (Bob rejects) if otherwise. 

Protocol Product{f°‘, /^, /“^) 

(a run will be abandoned with 0 returned if any party finds any error in any 
checking step, otherwise it returns 1 upon termination.) 

Alice sends to Bob: |a|, |6|, \ab\, and demonstrates the following evidence: 

i) logj(/“) = logjb (/“'’) and log; (/'’)= logja (/“'’); 

ii) |o| -I- |6| < \ab\ < \Q\ — 2. 



In Product, showing (i) can use the protocol of Chaum and Pedersen Q, and 
showing (ii) can use a protocol Bit-Size to be specified in the next subsection. 
Note that only the bit sizes regarding the first two input values need to be 
shown because, having shown the equations in step (i), the size regarding the 
third value is always the summation of those regarding the former two. 

We will analyze the security and performance of this protocol in Section 4. 



3.2 Proof of Integer Size 

The basic technique is due to Damgard |. We specify a simplified version based 
on the discrete logarithm problem. In our variation, the number in question is 
protected in computational (statistical) zero-knowledge. 

Let I be the interval [a, 5](= {x\a < x < b}), e = b—a, and /±e = [a—e, b-\-e]. 
In Protocol Bit-Size specified below, Alice can convince Bob that the discrete 
logarithm of the input value to the agreed base is in the interval / ± e. 

Protocol Bit-Size(f^) 

Execute the following k times: 

1. Alice picks 0 < < e at uniformly random, and sets t 2 := G — e; she sends to 

Bob the unordered pair Ci := /*L C 2 '■= 

2. Bob selects 6 = 0 or 5 = 1 at uniformly random, and sends b to Alice; 

3. Alice sets m := t\, U 2 ■= t 2 for b = 0, u\ := ti + s, U 2 := ^2 + s for 6=1, 
and sends ui, U 2 to Bob; 

4. Bob checks the following (with i = 1, 2): 

for b = 0, —e < Ui < e and Ci = /“q 

for 6 = 1, a — e < Ui < 6 -h e and Cif^ = /“q 



Set, for instance, I := [2^ 2^]. Then e will be 2^ and Bit-Size will prove 

that the discrete logarithm of the input value does not exceed ^ -I- 1 binary bits. 
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3.3 Proof of the Prime Factors’ Structure 

Let n = pqhe the user Alice’s public key for an integer factoring based cryp- 
tosystem. The proposed scheme will require Alice to set the primes p and q with 
the following structure 

p = 2p's+l, q = 2q't+l, (7) 

Here p' , q' , s, t are distinct odd numbers, any two of them are relatively prime, 
and their sizes satisfy 

b'|-NI = |9'|-|t| = 80. (8) 

Then set 

r = Ast. (9) 

As a procedure for key setup, Alice should first choose the numbers p' , q' , s, 
t at random with the required sizes, oddity and relative-prime relationship. She 
then samples if p and q in (7) are prime. The procedure repeats until p and q 
are prime. For p' , q', s, t being odd, both p and q will be congruent to 3 modulo 
4, rendering n to be a Blum integer Q. We will need this property for proof 
of two prime structure of n. It is advisable that p' and q' be chosen as primes, 
resulting in p and q as the so-called strong primes (for n in a secure size, p' 
and q' will be large) . This follows a desirable moduli setting for integer factoring 
based cryptosystems. 

From (7) and (8) we know 

\p\-2\s\ = \p'\-\s\=80=W\-\t\ = \q\-2\t\. 

Noting \p\ = l^l, so the above implies |s| = |t|, and 

\p + q\- |r| « b| + 1 - |4st| « IpI + 1 - (2 -f 2|s|) S {79, 80, 81}. 

Once the key values are fixed, Alice shall publish 

{p — l)(g — 1) 

A=fP, B = p, C=r, D = f ^ . 

Using these published values, Alice can prove to Bob the following two facts: 

i) n = logy:(A) log f{B), by running Product{A, B, /”); 

ii) logj(C) divides (logj(A) — l)(logj(i?) — 1), by running Product{C , D, ^4^) 
(note here the dis-log of the third input is (logj(A) — l)(logj(i?) — 1)). 

During the proof that runs Product, Alice has also demonstrated the bit size 
values I logj(A)|, | logj(i?)| and | logj(C)|. Bob should check that 

Ijil 

|log/(A)| - |log^(C)| « ^ - |logj(C)| G {79,80,81}. 

Finally Alice should prove that n consists of two prime factors only. This can 
be achieved by applying the protocol of Van de Graaf and Peralta for proof 
of Blum integers (we have constructed n to be a Blum integer), and the protocol 
of Boyar et al | for proof of square- free integers. 
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3.4 Verifiable Secret Sharing of r 

For self-containment, we include Pedersen’s threshold verifiable secret sharing 
scheme for sharing the secret r among a multi number of shareholders with 
threshold recoverability. 

Let the system use m shareholders, Using Shamir’s t {< m) out of m threshold 
secret sharing method (^J), Alice can interpolate a t-degree polynomial r{x) 

t-i 

r{x) = ''^CiX^ mod Q, 

i=0 

where the coefficients Ci, C 2 , • • • , Ct_i are randomly chosen from and Co = r. 
The polynomial satisfies r(0) = r. She shall send the secret shares r{i) (i = 
1,2, to each of the m shareholders, respectively (via secret channels), 

and publishes 

mod P, for z = 0, 1, • • • , m, 

and 

mod P for j = 0, 1, • • • , t — 1. 

Each shareholder can verify 

t-i 

fr{i) ^ (mod P) for f = 1, 2 • • • , m. 

3=0 

Assume that at least t of the m shareholders are honest by performing correct 
checking. Then the polynomial r(x) is now secretly shared among them. When 
key recovery is needed, they can use their secret shares to interpolate r(x) and 
recover r = r(0). 

We should point out that this sub-protocol is not a necessary component in 
the proposed scheme. There exists other schemes to prove a correct threshold 
encryption of a discrete logarithm (e.g., achieves verifiable secret sharing 
with a public verifiability; that is, secret sharing can be done without the pres- 
ence of the m shareholders and without assuming t of them to be honest). We 
have chosen Pedersen’s scheme for simplicity in presentation. 

4 Analysis 

We now provide security and performance analyzes on the proposed verifiable 
partial integer-factor sharing scheme. 



4.1 Security 

The security of the proposed scheme consists of the correctness and the privacy. 
The former concerns whether Alice can successfully cheat Bob to accept her 
proof using a key in which the private component is either not recoverable. 
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or recoverable at a much higher cost than that of the procedure specified in 
Section 2. The latter concerns whether Bob can gain any knowledge, as a result 
of verifying Alice’s proof, leading to discovery of Alice’s private key without 
going through the prescribed key recovery procedure. 



Correctness 

In Section 2 we have established the precise cost for key recovery. The remark 
following Lemma 1 further emphasizes that should there exist an algorithm that 
can find the missing key part with fewer than 2^*^ operations, that algorithm will 
also form a breakthrough improvement from all methods we know to date for 
computing small discrete logarithms. The remaining correctness issue is on that 
of the sub- protocols which realize the mathematics established in Section 2. 

The two sub-protocols that construct Product have well understood correct- 
ness properties BQ. The cheating probability for proof of a Difhe-Hellman triple 
is 1/Q where Q is the order of /, and that for Bit-Size is 1/2^, here k is the 
number of iterations in the it. Using fc = 40 will achieve a sufficiently low chance 
for successful cheating by Alice. So we can use 1/2^*^ as the cheating probability 
for Product (since 1/Q ^ 1/2^° for usual sizes of Q). 

Next, the correctness of the protocol for proof of two-prime structure is also 
well established with error probability 1/2^ for k iterations of message 

verification. Again, k can be set to 40. 

We note that in the proofs for the structural validity of n (i.e., n = pq, the 
primality of p, q, the relation r\{p — 1)((7 — 1), and the required sizes of these 
numbers), the verification job does not involve handling any secret. Therefore it 
can be carried out by anybody and can be repeated if necessary. So Bob cannot 
collude with Alice without risking to be caught. 

Finally, the correctness of the protocol for verifiable sharing of r li^is a 
two-sided error. The error probability for Alice’s successful cheating is 1 /Q which 
is negligible as Q is sufficiently large. The other side of the error is determined by 
the number of dishonest shareholders. The number r can be correctly recovered 
if no fewer than t out of the m shareholders are honest (i.e., they follow the 
protocol). Usually the threshold value t is set to [^J. 



Privacy 

Although each of the sub-protocols used in the construction of the main scheme 
has its own proved privacy, we cannot assume that when they run in combination 
they will still provide a good privacy for the main scheme. Thus, we shall study 
the privacy of the main scheme in the following manner: (i) the values that have 
been made public in the main scheme do not themselves leak useful information 
for finding the prime factors, and (ii) when viewed together, they will not produce 
a formula usable for finding the prime factors. We emphasize that the quality 
of (ii) is essential as we must be sure that the published values will not damage 
the original privacy of each sub-protocol used in the main scheme. 
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In addition to the public key n, in the main scheme Alice has published the 
following values: 



A=fP, B = f\ C = r. D = |p| = |q| = M, |r| 




First of all it is obvious that there is no danger for disclosing the bit size 
information of the secret numbers. So we need only to consider the first four 
numbers: A, B, C and D. 

We note that {A, B, /”) and {C, D, ) forming two Difhe-Hellman triples. 
This fact, however, will not help Bob to find the discrete logarithms of A, B, 
C, and D, following the security basis of the Difhe-Hellman problem. Analo- 
gously, any arithmetic of these numbers will not produce anything easier than 
the original Difhe-Hellman problem. 

We should remind that, even for n having a widely-believed least size of 512 
bits, the discrete logarithms of the published values above will all be sufficiently 
large. The least one is r whose size satishes 



|r| > 256 - 81 = 175, 



which is still sufficiently large and immune to the square-root attack aimed for 
extracting it from C. 

Next we shall review if any of the published numbers will damage the original 
privacy of each sub-protocol used. We can safely begin without concerning any 
impact on the two sub-protocols used in proof of two-prime structure of n because 
those protocols involves calculations in a different group: Z'* rather than Zp. 

The two sub-protocols that construct Product require Alice to hide the pri- 
vate input values (which are discrete logs of the input to Product or to Bit-Size) 
by using a genuine random number in each round of message exchange (these 
random numbers form the prover’s random input; review Q for proof of Diffie- 
Hellman triple, and Section 3.2 for proof of integer size). In both sub-protocols, 
the prover’s random input will have similar sizes as those of the private input. In 
fact, the sub-protocol for proof of Diffie-Hellman triple is perfect (computational) 
zero-knowledge Q, and that for proof of bit size is statistical zero-knowledge Q. 
As a result. Product can be simulated using the standard way for demonstrating 
an honest verifier zero-knowledge protocol. 

Finally we point out that Pedersen’s protocol for verifiable sharing of r (the 
discrete log of C) has a well understood perfect privacy following the privacy of 
Shamir’s secret sharing scheme based on polynomial interpolation 



4.2 Performance 

The main scheme consists of (i) verifiable secret sharing of r, (ii) two instances 
of running Product^ and (iii) proof of two-prime structure of n. 

Since (i) can be viewed as common in verifiable secret sharing schemes, we 
shall only analyze the performance of (ii) and (iii) as an additional cost for 
achieving verifiable partial sharing of integer factors. 
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The major cost in a run of Product is to prove the size of two integers 
(twice running Bit-Size), adding a trivial cost for proof of discrete logarithm 
equality (a three-move protocol) . For the two runs of Product, total data to be 
exchanged in Product will be bounded above by (4fc + 2)|P| (binary bits) where 
k is the number of iterations in Bit-Size. The number of computations needed 
to be performed by both the prover and the verifier can be expressed by 4fc|P| 
(multiplications modulo P). 

Next, in the proof of two-prime structure of n, the protocol of Van de Graaf 
and Peralta involves agreeing k pairs of random numbers in Z* and k bits as 
random signs. For each pair of the numbers agreed, the two parties will evaluate 
a Jacobi symbol which is at a level of performing a few multiplications in Z*. So 
the data to be exchanged here will be bounded above by (2fc-|-l)|n| (binary bits), 
where k can be as the same as that in Bit-Size. A similar estimate apply to the 
protocol for proof of square- free numbers Q. The number of computations need 
to be performed by both the prover and the verifier can be expressed by 2k\n\ 
(multiplications modulo n). We can use |P| to up-bound |n|. 

Combining the results in the two paragraphs above, we conclude in following 
statement for the performance of the proposed scheme. 



Lemma 3. With a verifiable secret sharing scheme for sharing the discrete log- 
arithm of an integer, partial sharing of the prime factors of an integer of size up 
to |P|— 3 (binary bits) can be achieved by further transmitting (8fc-l-3)|P| binary 
bits, with both the prover and the verifier computing 6fc|P| multiplications. □ 



If |P| is set to 1540, then the maximum size of the integers that can dealt 
with by the proposed scheme will be 1536 (binary bits) (= 3 x 512). Considering 
setting fc = 40 will allow a sufficiently small probability of 1 /2'^° for the prover 
successfully cheating. Then, the total number of bits to be transmitted will be 
497,420 (62 kilobytes). A slightly smaller number of multiplications (modulo P 
and modulo n) will need to be performed by both the prover and the verifier. 

5 Conclusion 

We have constructed a verifiable partial integer-factor sharing scheme and shown 
that the scheme is secure and practically efficient. The working principle of the 
scheme is an observation on a significantly lowered time complexity for factoring 
n = pq using a known factor of 4>{n). This is of independent interest in that the 
lowered complexity bound should be regarded as a piece of must-know knowledge 
for designing protocols or cryptosystems based on disclosing a factor of (j){n) of 
a non-trivial size. 
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Abstract. This paper introduces an improved higher order differential 
attack using chosen higher order differences. We can find a lower order 
of the higher order differential by choosing higher order differences. It 
follows that the designers of a block cipher can evaluate the lower bound 
of the number of chosen plaintexts and the complexity required for the 
higher order differential attack. We demonstrate an improved higher or- 
der differential attack of a CAST cipher with 5 rounds using chosen 
higher order differences with fewer chosen plaintexts and less complexity. 
Concretely, we show that a CAST cipher with 5 rounds is breakable with 
2^® plaintexts and < 2 ^'^ times the computation of the round function, 
which half the values reported in Fast Software Encryption Workshop’98. 
We also show that it is breakable with 2^® plaintexts and about 2 '^'^ times 
the computation of the round function, which are j^-th of those reported 
in Fast Software Encryption Workshop’97. 



1 Introduction 

Higher order differential attack is a powerful algebraic cryptanalysis. It is use- 
ful for attacking ciphers which can be represented as Boolean polynomials of 
low degrees. After Lai mentioned the cryptographic significance of derivatives 
of Boolean functions in Knudsen used this notion to attack ciphers that 
were secure against conventional differential attacks Q]. At FSE’97 Jakobsen 
and Knudsen ^ gave an extension of Knudsen’s attacks and broke ciphers using 
quadratic functions such as the cipher JCJ^ Q. They were provably secure ci- 
phers against differential and linear cryptanalysis. Furthermore, at Information 
Security Workshop’97 Q, we reduced the complexity required for the higher 
order differential attack of the cipher ICAf by solving the attack equation alge- 
braically. At Fast Software Encryption Workshop’98 we generalized it and 
applied it to a CAST cipher. 



S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 1999. 
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This paper introduces an improved higher order differential attack using cho- 
sen higher order differences. The higher order differential attack exploits the fact 
that a higher order differential (e.g., the d-th order differential) of an intermedi- 
ate data is constant, or independent of the key. In this paper we call the order 
“d” the order of the higher order differential. 

In the known higher order differential attack | Theorem 1], the order of 
the higher order differential was found from the algebraic degree of Boolean 
polynomials of the ciphertexts. That is, if a ciphertext bit is represented by a 
Boolean polynomial of plaintext bits of degree d, then d -1- 1 is the order of the 
higher order differential, since the (d -I- l)-th order differential of the ciphertexts 
becomes 0. Furthermore, in the higher order differential attack described in 
i y| , it was shown that if all subkeys are combined using operation XOR in a 
Feistel cipher, the order of the higher order differential is equal to the algebraic 
degree of the Boolean polynomials of ciphertexts. That is, if a ciphertext bit 
is represented by a Boolean polynomial of plaintexts of degree d, then d is the 
order of the higher order differential, since the d-th order differential of the 
ciphertexts becomes 1. It is known that the order of the higher order differential 
determines the required number of plaintexts and ciphertexts pairs (p/c pairs) 
and complexity. Therefore, it is important to find the lowest order of the higher 
order differential to estimate the security of a cipher against the higher order 
differential attack. 

This paper shows that we can find the lower order of the higher order dif- 
ferential by choosing higher order differences. For example, we demonstrate the 
higher order differential attack of a CAST cipher using chosen higher order dif- 
ferences with fewer chosen p/c pairs and less complexity. Concretely, we show 
that a CAST cipher with 5 rounds is breakable with 2^® plaintexts and < 2^^ 
times the computation of the round function, which half the values reported in 
Fast Software Encryption Workshop’98 We also show that it is breakable 
with 2^^ plaintexts and about 2^^ times the computation of the round function, 
which are j^-th of those reported in Fast Software Encryption Workshop’97 [^. 
The reason why we apply the improved higher order differential attack to a 
CAST cipher with 5 rounds is that we wan to to show how much improvement 
is achieved by choosing higher order differences. This attack is also applicable 
to other block ciphers. A similar improved higher order differential attack of a 
5-round MISTY without FL functions is shown in 



2 Higher Order Differential Attack 

This section gives an outline of the higher order differential attack. Fuller de- 
scriptions of the attack are presented in the references 

Let the target be a Feistel cipher with block size 64 bits and r rounds. We 
assume that the right half 32-bit of the plaintext is fixed at any value. We denote 
the left half 32-bit of the plaintext by a; = (xai, . . . , Xo) € GF(2)^^, the ciphertext 
by y = (yi, yfl), yL,VR e GF{2)^‘^, and the key by fc = . . . , fco), where the 

key length is I bits. Let X and K be sets of variables s.t. X = {a; 3 i, . . . , iq} and 
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K = , ko}. Let . . . , be the i-th round key. Throughout 

this paper, the subscript 0 indicates the least significant bit of the data. 

When the key k is fixed, an intermediate bit in the encryption process denoted 
by 2: G GF{2) can be represented as a Boolean polynomial with respect to X 
whose coefficients are Boolean polynomials with respect to K, i.e., 2 = g[k]{x), 
where 

g[k]{x) = • ajgY 

Note that Ci^i,...,io{k) is the coefficient of x'‘^l ■ ■ ■ Xq°, and 131 , . . . , io is 0 or 1. 

Definition 1. We define the i-th order differential of g[k]{x) with respect to X, 
denoted by ^^■^g[k][x), as follows; 

^[a)9[k]{x) = g[k]{x) + g[k]{x+ a), 

where a € GF(2)^^, and {oi, . . . ,ai} C GF{2ff‘^ are linearly independent. Let 
denote bitwise XOR. In this paper, since we consider only the higher order 
differential with respect to X, we omit “with respect to X.” 

Definition 2. On ^^,^g[k]{x), which is the i-th order differential ofg[k](x), 

we define i as the order of the higher order differential. Furthermore, we define 
{oi, Oi-i , . . . , ai} as the i-th order differences. The i-th order differences consist 
of i-tupple differences in GF{2ff^. 

The following theorems are known on the higher order differential of Boolean 
functions QQ. 

Theorem 1. Q The following equation holds for any b € GF{2ff‘^ and {oi, . . . , 
oi} C GF(2)32. 

^\^,...,a^)9Mb) = Y 9[k]{x + b). 

Note that . . . , ai] denotes the i-dimensional subspace spanned by {at, . . . , 

oi}. In other words, it is the set of all 2* possible linear combinations of ai, . . . , 
oi, where each Oi is in GF{2ff^ and linearly independent. 

Theorem 2. Q Let d be a natural number. Let {od+i, . . . , ai} C GF(2ff'^ be 
linearly independent. If the degree of g[k]{x) with respect to X is d, then we have 
the following equations. 

£ RIK], ™<i 

^!S;; = ». 

where R[K] is the Boolean polynomial ring of K. 
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Fig. 1. A higher order differential attack of Feistel ciphers 



Attack Procedure. In the improved higher order differential attack described 
in this paper, the last round key is recovered as follows (See also Fig^)- 

Step 1. Find the lowest order d s.t. for the d-th order differences ^{od, ■ ■ ■ , oi} 
£ GF(2)^^ and £ GF(2)^^, the d-th order differential of ynih), i.e., 
independent of the key. 

Step 2. Construct attack equation ^ and solve it with respect to the last 
round key 

..,cit<'’i(w('’))+ ZZ ..,ww- 

^ E + E yL{x) = ^ Vr{x) (1) 

xGV^‘^^[afi,...,ai]+b ,ai]+6 xG , . . . ,ai ]+6 

One way to find the last round key is exhaustive search where the 
attacker tries all 2^^ possible candidates of and finds the correct key. 
Another way is the algebraic solution where the attacker transforms 

the algebraic equations into the system of linear equations and solves it, for 
example. 



3 CAST Ciphers 

This section describes CAST ciphers, which we use for demonstrating the im- 
proved higher order differential attack. 
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® 




Fig. 2. Round function of CAST ciphers 



CAST ciphers are a family of symmetric ciphers constructed using the CAST 
design procedure proposed by Adams and Tavares [[]. The CAST design pro- 
cedure describes that they appear to have good resistance to differential crypt- 
analysis 0, linear cryptanalysis and related-key cryptanalysis Q]- 

CAST ciphers are based on the framework of the Feistel cipher. The round 
function is specified as follows (See also Fig|). A 32-bit data half is input to 
the function along with a subkey These two quantities are combined using 
operation “a” and the 32-bit result is split into four 8-bit pieces. Each piece is 
input to a different 8 x 32 S-box {Si, Sa, S 3 , and 54 ). S-boxes Si and S 2 are 
combined using operation “b”; the result is combined with S 3 using operation 
“c” ; this second result is combined with ^4 using operation “d” . The final 32-bit 
result is the output of the round function. 

The CAST design procedure allows a wide variety of possible round func- 
tions: 4 S-boxes and 4 operations (a,b,c, and d). As for S-boxes, reference B 
suggested constructing the S-boxes from bent functions. Later, on reference B 
CAST ciphers with random S-boxes was proposed. In our attack, we use the 
S-boxes based on bent functions proposed for CAST-128. CAST-128 is a famous 
example CAST cipher used in several commercial applications, e.g., PGP5.0. As 
for operations, a simple way to define the round function is to specify that all op- 
erations are XORs, which is addition on GF{2), although other operations may 
be used instead. Actually, according to reference some care in the choice of 
operation “a” can conceivably give intrinsic immunity to differential and linear 
cryptanalysis. 
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As for the number of rounds, it seems that the CAST design procedure 
doesn’t specify a concrete number. For example, CAST-128 has 12 or 16 rounds 
Q. There are also several key schedules for CAST ciphers, but for the purpose 
of our attack the key schedule makes no difference. 



4 Higher Order Differential Attacks of a 5-round CAST 
Cipher using Chosen Higher Order Differences 

This section demonstrates an improved higher order differential attack using 
chosen higher order differences. The target is the CAST cipher with 5 rounds 
described in Section 3. The improvement consists in Step 1 in the attack proce- 
dure in Section H 



4.1 How to find the lowest order of the higher order differential 

Using degree of Boolean polynomials In the previously known higher order 
differential attack, the order of the higher order differential was derived using 
the degree of Boolean polynomials of ynix) with respect to X. 

The way to find the order of the higher order differential of the CAST ci- 
pher with 5 rounds is as follows. We begin by considering the degree of the round 
function. Let Si, S 2 , S 3 , and ^4 be the functions of S-boxes: GF(2)® — > GF(2)^^. 
It is shown in Q that for every S-box all output bits can be represented by 
Boolean polynomials of input bits of degree 4. Considering the structure of the 
round function (See Fig^, all output bits of the round function can be repre- 
sented by Boolean polynomials of input bits of degree 4, since we assume that 
operations “a”, “b”, “c”, and “d” are XORs Q. 

If we fix the right half of the plaintext at 0 € GF{2)^^, the right half of the 
output of the 4-th round ynix) G GF(2)^^ can be represented as Eq. Q. 

ynix) = /(/(a; -f -k -k a; -k f(k^^^), (2) 

where / : GF{2)^^ GF{2)^^ is the round function. Since / can be represented 

by Boolean polynomials of input bits of degree 4, the degree of f{x+ f{k^^'>)+k^‘^'>) 
with respect to A* = {a;3 i, . . .,a;o} is 4, and the terms of the 4-th degree have 
the coefficient in GF{2)^^, which means that it is independent of the key. Hence, 
the degree of ynix) with respect to X is at most 16, and the terms of the 16-th 
degree included in Eq. Q have the coefficient in GF{2)^^, which means that 
it is independent of the key. Therefore, the 16-th order differential of ynix) is 
constant for any linearly independent 16-th order difference {uie, Uis, ■ . ■ , oi}. 

= C e GF(2)32 const.) 
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Using chosen higher order differences In this section, we show that if 
we choose higher order differences, some higher order differential attacks of the 
CAST cipher with 5 rounds are possible where the order of the higher order 
differential is less than 16. Whether a higher order differential attack is possible 
when the order of the higher order differential is less than 16 depends on whether 
a higher order differential of yii{x) is independent of the key for the order of the 
higher order differential less than 16. 

[When the order of the higher order differential is 15] Let us con- 
sider the 15-th order differential of ynix). First, we prove the following lemma. 

Lemma 1. If we choose {ei 4 , ei 3 , . . . , cq} for the 15-th order differences { 015 , 
oi 4 , . . . , Oi}, each bit of the 15-th order differential ofyn{x) is constant or linear 
with respect to a key bit. Note that Ci is defined as: 



Proof. The 15-th order differential of ynix) for 15-th order differences {ei 4 , 613 , 
..., eo} is the same as the 15-th order differential of ynix) with respect to 
{a;i 4 , a;i 3 , . . .,a:o}. Therefore, the 15-th order differential of ynix) doesn’t have 
the terms that don’t include all variables of { 0 : 14 , a;i 3 , . . . , a;o}. All the terms of 
ynix) that remain in the 15-th order differential are as follows; 



First, we show why the terms such as C 3 {k)xieXi 4 - ■ ■ Xq don’t remain. Let 
Ai, A 2 , A 3 , and A 4 be sets of variables as follows: 



The terms of degree 16 included in ynix) is the product of four terms of degree 4 



The terms in the output of the 2-nd round function consist of terms with respect 
to only Ai, terms with respect to only A 2 , terms with respect to only A 3 , terms 
with respect to only A 4 , and constant terms depending on k. Therefore, the 
terms of degree 16 that contain variables a;i 6 C A 2 , {iis, . . . , xg} C A 3 , and 
{ 0 : 7 , . . . , a;o} C A 4 don’t remain in the 15-th order differential of ynix) for 15-th 
order differences {ei 4 , 613 , . . . , eo}. 

Second, consider the coefficient of the degree-15 term Ci(fc)a;i 4 a;i 3 • • • Xq 
(E q. B). We begin by considering the terms included in the output of the 2- 
nd round function. The output of the 2-nd round function includes the terms 
in the following, as one example, since the input of the 2 -nd round function is 



e, = (0,...,i,...,0)€GF(2)32. 



degree-15: Ci(fc)a;i 4 a;i 3 ■ ■ ■ Xq and 



(3) 

(4) 



degree-16: C2{k)xi5Xi4 Xq. 



= {a;3ij 2:30, . . . , 2:24}, 
A*2 = {2:23, 2:22, ■ ■ • , 2 ;i 6|, 
A^3 = {2:15, 2 Ji 4, . . . , 2:8}, 
A^4 {2:7 , Xg, . . . , Xq}. 
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1 + Let fi : GF{2)^^ GF{2) be the function which outputs the 

i-th bit of the output of /. 

degree- 4: 

{x3 + f3{k^^'^) + k^3^){x2 + f2{k^^'‘) + kf^){xi + fi{k^^'‘) + k^j^^){xo + fo{k^^'‘) + k^^^){5) 
degree- 3: 

{X2 + f2{k^^^) + k^^'^){xi + fi{k^^^) + k[^'‘){xo + fo{k^^^) + k^^'‘) ( 6 ) 

The coefficients of the terms of degree-4 with respect to X included in the 
output of the 2-nd round function are 1, if they exist, since they come from 
terms such as Eq. Q. The coefficients of the terms of degree-3 with respect to 
X are the sum of the coefficients of the terms expanded from terms such as 
Eq. Q and terms such as Eq. Q . The coefficient from the former is linear with 
respect to a key bit, and the coefficient from the latter is 1. Since the terms of 
degree 15 with respect to X included in yR{x) are the products of three terms 
of degree 4 and one term of degree 3 included in the output of the 2-nd round 
function, from the discussion above, the coefficients of the terms of degree 15 
are represented as 

cxi X ki F OiQ, 

where ki is a key bit and ai, ao € GF{2). 

From similar considerations, the coefficients of the terms of degree 16 included 
in yn{x) are 0 or 1. 

In conclusion, considering 

— the coefficient of the degree-15 term ci(fc) is aiki+ao, where ai, ao G GF{2), 
and 

— the coefficient of the degree- 16 term C2(fc) is 0 or 1 

which remain in the Boolean polynomial of each bit of each bit 

of eg'jVRix) is one of the following: 

{xi5 F ki F 1, 3^15 F ki^ Xiii F 1, :ti5, ki F 1, ki^ 1, 0} . 

Moreover, when the degree-16 term X 15 X 14 ■ ■ ■ Xq exists, ai is always 1, since the 
input of the 2-nd round function is a; -I- + k^'^\ Therefore, each bit of 

^(eii...,eo)^^(^) following: 

{ Xii^ F ki F 1 

X15 -t- ki 
0 

This proves that if we choose {ei4 , . . . , eo} for the 15-th order differences, 
each bit of the 15-th order differential of ynix) is constant or linear with respect 
to a key bit. □ 

From Eq. ^ in the proof of Lemma 1, the following corollary is proved. 
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Table 1. The number of constant bits of the 15-th order differential of ynix) 
for some chosen differences 



differences 


[El, E2}\ci 


{£1, £3} \ 


{£1, £4} \ Ci 


of constant bits 


15 


13 


12 


differences 


{£2, £3} \ 


{£2, £4} \ Si 


{£3, £4} \ ei 


if of constant bits 


11 


14 


9 



Corollary 1. if we choose {ei4, . . . , eo} for the 15-th order differences, some 
bits of the 15-th order differential of ynix) are constant, and the XORed values 
of any two bits of the other bits are also constant. 

A similar corollary is proved for the following 15-th order differences {015, 

. . ai}. 

{ai5, . . . , ai} = 

{ [^ {El, E2, E3, E4} > \ (one of chosen eds) ( 8 ) 

Two sets of El, E2, E3, and Ei ) 

where Ei = {631, 630, ... 624}, £-2 = {c23j 622, . . . Cie}, £3 = {cis, 644, . . . eg}, 
£4 = {e7, eg, . . .eo}, and “\” denotes the exclusion from the set. 

Experimental verification. We computed the 15-th order differential of ynix) 
for all 15-th order differences represented by Eq. Q by computer experiments. 
Table^^hows the number of constant bits in the 15-th order differential of ynix). 
Note that {£1, £2}\ci denotes the set {£1, £2} excluding an arbitrary difference 
Ci in {£1, £2}. The number of constant bits and the bit-positions don’t depend 
on the excluded e^. This is obvious from the fact that the constant doesn’t depend 
on Xi- 

[When the order of the higher order differential is 14] For the 14- 
th order differential of ynix), it is shown that if we choose {ei3, ei2, . . . , eoj for 
the 14-th order differences, each bit of the 14-th order differential of yii{x) is 
quadratic with respect to key bits. Table^^hows that the degree with respect to 
the key of each bit of the 14-th order differential of yii{x) for some chosen dif- 
ferences. Note that the 14-th order differences {014, 043, . . . , 04} are chosen from 
{c34, 630, . . . , 645}. The column “differences” in Tablflholds the XORed values 
044 + 043 -I- • • • -t- 04. Tabl^|shows that some bits of the 14-th order differential 
of ynix) are constant, or independent of the key, a fact that is exploited in the 
improved higher order differential. 

[When the order of the higher order differential is 13] For the 13-th 
order differential of ynix), it is shown that if we choose {642, 644, . . . , eoj for the 
13-th order differences, each bit of the 13-th order differential of is degree 
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Table 2. The degree of each bit of the 14-th order differential of ynix) 



differences 


bit position of the 14-th order differential of ynix) 
313029282726252423222120191817161514131211109876543210 


1111111111111100 


1 


2 


2 


2 


2 2 2 


2 


2 


0 


0 


1 2 


2 2 


1 


2 


0 


1 


2 2 


1 1220110121 


1111111111111010 


1 


2 


2 


2 


2 2 2 


2 


2 


1 


0 


1 2 


2 2 


1 


2 


1 


0 


2 2 


1 1220111122 


1111111111110110 


1 


2 


2 


2 


2 2 2 


2 


2 


0 


0 


1 2 


2 2 


1 


2 


1 


1 


2 2 


1 122101112C 


1111111111101110 


1 


2 


2 


2 


2 2 2 


2 


2 


0 


1 


1 2 


2 2 


1 


2 


0 


0 


2 2 


1 1220111121 



Table 3. The bit-position of constant bits of the 13-th order differential of ynix) 



differences 


bit position of the 13-th order differential of ynix) 
313029282726252423222120191817161514131211109876543210 


1111111111111000 


0 


1111111111100011 


0 


1111110001111111 


0 0 



3 or less with respect to key bits. TableHskows that the bit positions where the 
13-th order differential of ynix) is constant for some chosen differences. Note 
that the 13-th order differences {flis, an, . . . , oi} are chosen from 13 differences 
of {c3i, 630, ■ ■ ■ , Cie}- The column “differences” in Tablflholds the XORed val- 
ues ai3 -I- oi2 -I- • • • -I- oi. Similarly, Tablefl shows that some bits of the 13-th 
order differential of ynix) are constant, or independent of the key, a fact that is 
exploited in the improved higher order differential. 



4.2 Construct the attack equation and find the last round key 

In this section we construct attack equation 0 using the higher order differences 
found in Step 1 (Section^J, and find the last round key If we find the last 
round key by checking all possible 2^^ candidates exhaustively as shown in 
0, higher order differential attacks are possible where the 13-th order differences 
given in Section^Jare used. The required number of chosen p/c pairs is only 2^^, 
and the required complexity is about i x 2^^ x 2^^ = 2^^ times the computation 
of round function on average (see new result (II) in TableH • 

If we find the last round key by solving attack equation ^ algebraically 
as shown in the required complexity can be reduced, though the required 

number of chosen p/c pairs increases slightly. Hereafter, let = (fc3i,fc3o, 
. . . , ko) denote the last round key, and define the set of variables as if (5) = 
{^31, ^30, ■ • ■ , ko}. According to reference the degree of attack equation Q 
is 3 with respect to and we have to solve algebraic equations of degree 3 
with 32 unknown variables. If we transform it to a system of linear equations 
regarding all monomials on in attack equation ^ as independent unknown 
variables, all variables in can be determined uniquely. The number of un- 
known variables is 368 Section 4.2], and we have to prepare 368 equations. 
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Table 4. Required # of chosen p/c pairs and complexity for attacking a 5-round 
CAST 



Attacks 


# of p/c pairs complexity 


Jakobsen & Knudsen ' 


g’-T 2^5 


Moriai, Shimoyama & Kaneko ' ' 


g’-T 2^3 


New result (I) 
New result (II) 


213 2^'‘ 



If we use one of the 15-th order differences given in Section^J we can obtain 32 
equations (equations for 32 bits) using 2^® chosen p/c pairs. For the remaining 
368 — 32 = 336 equations, we can choose 15 different 2^^ chosen p/c pairs from 
the same 2^® p/c pairs as above, but it seems difficult to prepare 336 equations 
according to Table ^ Therefore, we use arbitrary 16-th order differences and 
obtain 32 equations using 2^® chosen p/c pairs, and for the remaining 336 equa- 
tions, we obtain them using some 15-th order differences given in Section 
which we can choose from the 2^® p/c pairs above. In this case, the required 
number of chosen p/c pairs is only 2^®, and the required complexity is less than 
2^"^ times the computation of round function (see new result (I) in TableH. De- 
riving the required complexity is explained in reference Table ^shows new 

results on the number of p/c pairs and the complexity required for attacking a 
5-round CAST cipher and compares them with previous results. 

5 Conclusion 

This paper introduced an improved higher order differential attack using chosen 
higher order differences. We demonstrated a higher order differential attack of 
a CAST cipher with 5 rounds using chosen higher order differences with fewer 
chosen p/c pairs and less complexity than the previous results. It is open whether 
the attack can be extended beyond 5 rounds. The target cipher is an example 
of a family of symmetric ciphers constructed using the CAST design procedure. 
CAST-128, which is used in several commercial applications, e.g., PGP5.0, has 
a stronger round function and more rounds, hence the improved higher order 
differential attack seems difficult to mount against CAST-128. 

We’re working on how to find the lowest order of the higher order differential, 
which will lead to provably security against higher order differential attacks. 
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Abstract. Maximum average of differential probability is one of the 
security measures used to evaluate block ciphers such as the MISTY 
cipher. Here average means the average for all keys. Thus, there are keys 
which yield larger maximum differential probability even if the maximum 
average of differential probability is sufficiently small. 

This paper presents the cases in which the maximum differential proba- 
bility is larger than the maximum average of differential probability for 
some keys, and we try to determine the maximum differential probability 
considering the key effect. 

Keywords: Differential cryptanalysis, linear cryptanalysis, differential, 
maximum average of differential probability, linear hull, maximum aver- 
age of linear probability, maximum non-averaged differential probability, 
maximum non-averaged linear probability, DES-like cipher 



1 Introduction 

The security of symmetric key block ciphers against differential cryptanalysis B 
can be evaluated using several security measures. The maximum average of dif- 
ferential probability Q is one such measure. We can regard that differential 
cryptanalysis fails for a block cipher if the maximum average of differential prob- 
ability for the block cipher is sufficiently small. Thus, designers of a block cipher 
should guarantee that the maximum average of differential probability is suf- 
ficiently small. Some block ciphers were shown to have maximum averages of 
differential probability that were sufficiently small by Knudsen et al. BB. 

It is important to note that average of the maximum average of differential 
probability means the average over all keys. That is, even if the maximum average 
of differential probability is sufficiently small, the block cipher may be insecure 
for some keys. Canteaut evaluated the maximum differential probability for all 
keys not just the average of all keys for some types of DES-like ciphers B- 
However, the proof of her main theorem was fiawed. 

This paper points out a fiaw in the proof of B, and extends the theorems that 
have correct proof to linear cryptanalysis B- Our conclusion is that inequalities 
similar to B hold if the number of rounds is less than 3. Moreover, we report 
experimental results. The results show that more rigorous inequalities may be 
proved on non-averaged differential probability for a specific T-function. 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 1999. 

© Springer-Verlag Berlin Heidelberg 1999 
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2 Preliminaries 

We define the following. 

1. Prob[yl|i3] = 0 if i? is an empty set. 

2. Operations not specified here are as given in GF(2)h 

3. Suffixes L and R are the left and right half of the variable letter regarded as 
bit string, respectively. 

We define the following notations. 

dom / the domain of function / 

a • 5 Even parity of bitwise logical AND operation of bit strings a and b 
Af{a, b) {x\f{x) + f{x + a) = b} 

5f{a,b) *Af{a,b) 
jmax max (5/ (a, 6) 

Z\A A + A* 

Af{a, b) {a;|a; • a = f{x) • b} 

Xf{a,b) 2#A/(a, 5) — ^ dom/ 
maxXf(a,bf 

(a,b) Concatenation of bit strings a and b 
exPh(e) 5® 

5(A) {/: A^ A|/: bijective} 

C{f) equivalence class of / 

T/ ~ quotient set of set T with the equivalence relation 

i.e. {C(/)|/gT} 

V{T) power set of T 

We define the precedence of operations as the following. 

A ^ ^ + 

We define the cipher Afcj,fc 2 ,...,fcr> the analysis target, as follows. 

1. L{i),R{i) (n-bit) 

2. A(0): plaintext (2n-bit) 

3. {L{z),R{i))=Y{i) 

4. < L{i + 1) = R{i) for 0 < z < r 

i?(z + 1) = L{i) + Z{i) 

We define F-function as (i?(z)) = f{R{i) + fcz+i), and / as bijective. 
Moreover, we define 



rmax 

Oe — 



\ max 

Ae — 



max 

ki ,^2 

max 

ki ,Al2 



rmax 

, fc2 ' • • •> fcr 
\ max 

, k2 , • • • , fcr 



for the block cipher E. 
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The following lemma is useful for evaluating linear probability. 

Lemma 1. Following equation holds for n-input Boolean function f : GF(2)" — > 
GF(2). 

^ exp_i(/(a:)) = 2Prob[/(a;) = 0] - 1 

a:GGF(2)" 

3 Previous Results 

3.1 Averaged Case 

Nyberg and Knudsen showed a bound of the maximum average of differential 
probability for r > 3 Q, and Matsui pointed out that a similar inequality holds 
in linear cryptanalysis using dualit’ 
bounds for bijective F-Function 

Lemma 2 ( | ■ ' ■ | ). 

max Average fe, , 

“/0./3fci,fc2,...,fcr 

max Average Ae, 

a, 09^0 ki,k2,--;kr 

3.2 Non- Averaged Case 

Ganteaut showed some results of differential probability as dependent on keys pl - 

Lemma 3 (2-round differential probability). 

Prob [AF(2) = f3\AYjO) = a] = 
y(o),Y*(o) w Ml w j 22 " 

Lemma 4 (3-round differential probability with trivial 1-st round). 

Prob [AF(3) = /3| AF(0) = (oe, 0)] = + fdn) 

Lemma 5 (3-round differential probability with trivial 2-nd round). 

Prob [Z\F(3) = (a«,/3«)|AF(0) = a] = 

y(U),r (U) z 



; later Aoki and Ohta showed the strict 



,Pa,/3)<(<5r")' 

,Pa,/3)<A7“ ^fr>3 



^ Kaneko et al. showed similar inequalities for general F-function [~|. 
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4 Extension to Linear Cryptanalysis 

We can prove the lemmas in Sect. ^3 for the case of linear probability using 
LemmaO We show the lemmas here and prove them in Appendices. 

Lemma 6 (2-round linear probability). 



2 Prob[y (0) • a = Y{2) • /3] - 1 
= exp_;^(fci • {ttR + f3n) + fc2 • {aL + Pl)) x 



^f{OiR + Prj OiL)Xf{aL + PljPr) 



Lemma 7 (3-round linear probability with trivial 3-rd round). 

Xf{aR + PL,aL)Xf{aL, Pl) 



2Prob[y(0) • a = F(3) • (/3 l, 0)] - 1 



= exp_]^(fci • (ofl + Pl) + fc2 • ol) X 



22r 



Lemma 8 (3-round linear probability with trivial 2-nd round). 

OiL)\ - 1 

\f{uR, aL)Xf{PL,aL) 



2Prob[y(0) • a = F(3) • (/3 l, ol)] - 1 



= exp_;^(fci • a_R + fcs • Pl) x 



22r, 



5 Computer Evaluations 



5.1 Extension Trial to General Rounds 



Canteaut evaluated DES-like ciphers with general rounds similar to Sect. 
Unfortunately, the results contain errors. First, she described 



P[AY{3) = {Pl,Pr)\AY{ 0) = {aL,aR),Ki = k,, K 2 = k2, = fcg] 

= ^ P[AZ{2) = Pr + d\AR{2) = Pl, AY{0) = {aL,aR), 

d 

K, = ki,K2 = k2,K3 = k3] 

Sf{aR, d + aL)Sf{d, Pl + Ofl) 

in 3 Theorem 1] . The conditional part of the probability formula of the right 
side of this equality misses Z\F(2) = (d, /3i). So, this equality does not hold| 
In addition, we believe that the induction part of the proof that she did not 
describe has the same error. 

^ Her probability formulas of | does not contain information on random variables, 
so her proofs are hard to understand. We did not understand the correctness of the 
proof of Propositions 2 and 3 in So, the proof of Propositions 2 and 3 may be 
flawed, however, we confirmed that the statements of these propositions are correct. 
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5.2 Preliminaries 

We tried to prove P Theorem 1] , but computer based approaches seemed feasi- 
ble. 

We used computers to evaluate the differential probabilities for all / in the 
case that n is small and for randomly generated / in the case that n is not small. 

The evaluations consider all round keys and all bijective functions for /- 
function. Thus the evaluations are enormously complex. 

Following lemmas are effective for decreasing the complexity of computer 
evaluations. Proofs are in Appendices. 

Lemma 9. We can obtain the same value which is an element of a set T of 
a measure h : 'P(GF(2)^" x GF(2)^") ^ T by adjusting other keys for r- 
round cipher Y(r) = E(Y(0)) even if we change one of any even round key 
and one of any odd round key in an arbitrary manner, where the measure 
ft.({(T(0), T(r))|y(r) = A(Y'(0))}) satisfies the following equation. 

[h{{{Y{0) + AT, Y{r) + v)\Y{r) = A(T(0))}) 

= M{(no),FW)inr-) = i?(no))})] w 



Corollary 1. is independent of (ki,k 2 ). 

Corollary 2. is independent of {ki,k 2 ). 

These corollaries suggest that to evaluate an r-round cipher, using all round 
keys is not necessary; considering only r — 2 round keys is sufficient. 

Moreover, since the evaluations consider all keys, it is sufficient to evaluate / 
or g if Va;[/(a;) = g{x) + k] holds. We introduce the following equivalence relation 
for achieving this purpose. 

Lemma 10. 



/ - 5 ^ € GF(2)",Vx G GF(2)"[/(x) = g{x) + k] 

is an equivalence relation over 5(GF(2)"). 

ft is trivial using this equivalence relation to show that it is sufficient for 
considering a complete set of the representatives of S'(GF(2)”)/^. 

Lemma 11. For any Xq, yo, 

{feS(GF(2r)lf(xo) = yo} 

is a complete set of representatives in S'(GF(2)"')/^. 

Using this lemma, it is sufficient for us to consider only the elements of, for 
example, /(O) = 0 in 5(GF(2)"). 
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5.3 Experimental Results 



We calculated the maximum differential probability of F-function and the 
maximum differential probability of cipher for at least 3-round ciphers using 

the lemmas of the previous sections. 

a 3 calculated all / for the case that number of bits of F-function is equal 
and randomly generated / in the case that numbe r gf bits of F-function 
is greater than 3. We show the results in Tables H and Ratio here means 
the ratio of the number of F-functions which derives pairs to all 

bijective functions (or all of randomly generated bijective functions in the case 
of Table H- 

In the tables, * denotes the items that do not satisfy the evaluation inequality 
of Lemma D the maximum average of differential probability is replaced with 

jmax (5^^^ 

maximum differential probability, )^. These tables show that the 

inequality is 4.5 times looser for some F-functions. However, these tables also 



show that the maximum differential probability is smaller than 



6 

(- 



2n 






for some 



F-functions. 

We obtained the following interesting examples. 



Example 1. The following example shows that the statement of Q Theorem 1] 
does not hold. 



n = 4, r = 3 

'0 1 2 3 45 



Xmax 



f = 



6 7 8 9 10 11 12 13 14 15 



0 1 2 3 4 8 15 11 7 12 6 13 5 10 14 9 



1 

4 



24 

xmax Xmax 

=1.5x(^)^ 



28 



Example 2. The following example shows that the maximum differential prob- 
ability of cipher is less than the square of maximum differential probability of 
F-function. 



n = 3, r = 5 

'0 1 2 3 4 5 6 7 
01376254 



/ = 



Xmax 
"/ 



23 



= 1 



26 4 ^ 23 '' 



Example 3. The following example shows that the maximum differential prob- 
ability of cipher is 4.5 times greater than the square of maximum differential 
probability of F-function. 



n = 3, r = 5 

/01234567 
^ ~ l01267543 



Xmax 

°j_ 

23 



xmax xmax 



3 We omit less than 3-bit F-function since its cases are trivial. 

4 We calculated cases of n > 5, however, we could not find an interesting case. 



124 



Kazumaro Aoki 



Table 1. Differential probabil- Table 2. Differential probabil- 
ity of an i^-function and dif- ity of an i^-function and dif- 
ferential probability of a cipher ferential probability of a cipher 
(n = 3) (n > 3) 



n r 




Ratio 


3 3 


2 


4 4/15 




4 


16 7/15 




8 


64 4/15 


T" 


2 


^*^8 12/45 




4 


8 6/45 
16 14/45 
*32 1/45 




8 


32 9/45 
64 3/45 


T 


2 


*10 2/45 
*12 4/45 
*18 6/45 




4 


8 6/45 
16 12/45 
*32 2/45 
*48 1/45 




8 


16 6/45 
32 3/45 
64 3/45 


'W 


2 


*10 2/45 
*12 4/45 
*18 6/45 




4 


12 6/45 
*18 12/45 
*32 2/45 
*48 1/45 




8 


16 6/45 
32 3/45 
64 3/45 


T 


2 


*12 2/45 
*16 4/45 
*18 6/45 




4 


12 6/45 
*18 12/45 
*32 2/45 
*48 1/45 




8 


16 6/45 
32 3/45 
64 3/45 



n r 






Ratio(%) 


4 3 


4 


16 


1.4 






*18 


1.0 






*20 


0.1 






*24 


0.0 




6 


36 


49.2 




8 


64 


37.8 




10 


100 


8.7 




12 


144 


1.7 




16 


256 


0.1 


5 3 


4 




0.0 






*20 


0.0 




6 


36 


21.0 




8 


64 


62.6 




10 


100 


14.7 




12 


144 


1.5 




14 


196 


0.1 




16 


256 


0.0 




18 


324 


0.0 
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There exists 16 differentials which have differentially weak keys in 1/32 key 
space, and all weak keys for each differential are different. Thus, a key which is 
in one half of the key space of the cipher is differentially weak. 

6 Conclusion 

This paper has extended the evaluation of maximum differential probability and 
maximum linear probability with keys for DES-like ciphers in the case that the 
F-function is bijective. As a result, strict evaluations we derived for 2-round and 
some 3-round ciphers. These results are the same as those gained by evaluating 
the maximum average of differential probability and the maximum average of 
linear probability, parameters were used as security measures against differential 
cryptanalysis and linear cryptanalysis, respectively. 

Moreover, we have evaluated the maximum differential probability with keys 
over 3-round using computers. As a result, it is proved that there are cases 
in which the maximum differential probability is 4.5 times greater than the 
maximum average of differential probability. 

There are three open problems. 

1. obtaining general case evaluation 

2. characterizing the F-functions whose maximum differential probability with 
keys is small 

3. constructing design procedures of key scheduling which does not produce 
weak keys against differential cryptanalysis 
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A Proof of LemmaQ 



We prove this lemma using Fig.J 



2Prob[F(0).a = y(2)./3]-l (2) 

= ^ exp_i(L(0)*ai-l-i?(0)*ai{-k(A(0)-l-/(i?(0)-l-fci))*/3i 

L(0),R(0) 

+ {R{0) + f{L{0) + f{R{0) + h) + k2)) • /3fl) 



(3) 

exp_i(i?(0) • + /(i?(0) + fci) 

L(0),fl(0) 

+ L(0) . (ttL + /3i) + /(i(0) -L /(i?(0) + fci) + k2) • /3fl) (4) 

= ^ exp_i(i?(0) • (ofi-l- /3ij) + /(i?(0) -L fci) • /3i 

L(0),R(0) 

+ (/(^(O) + ki) -h k2) • {OiL + /3l) 

+ (i(0) -k /(i?(0) -k fci) -k k2) • {cxl + /3i) 

+ /(L(0)+/(i?(0) + fci) + fc2)*/3fl) (5) 

= ^ X] exp_i(i?(0) • {uR + (3r) + f{R{0) + fci) • Qfi -k fc 2 • (oi + /3i))(6) 
R{0) 



X ^ exp_i((L(0) -k f{R{0) + fci) + k 2 ) • (tti + /3i) 

L(0) 

+ /(L(0)+/(i?(0) + fci) + fc2)./3fl) 

= ^ X] exp_i((i?(0) -k fci) • {an + Pr) -k /(i?(0) -k fci) • 
R{0) 

+ fci • {aR -k Pr) + fc2 • {(XL + Pl)) 

X ^f{(XL + Pl,Pr) 



(7) 
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= exp_i(fci • {ur + (^ r ) + fc2 • (a_L +/3i)) 
ai)A/(aL + Pl,Pr) 



(8) 



B Proof of LemmaQ 



We prove this lemma using Fig.J 

2Prob[F(0) • a = F(3 ) • (/?l, 0)] - 1 (9) 

= ^ exp_i{L{ 0 ) • UL + R{ 0 ) • aR 

L(0),R(0) 

+ (i?(0) + f{L{ 0 ) + f{R{ 0 ) + fci) + fca)) • Pl) 

(10) 

= ^ X] exp_;^(L(0)*aL + i?(0)*(afi + /3i) 

L(0),R(0) 

+ f(L( 0 ) + f(R( 0 ) + ki) + k 2 ) • Pl) (11) 

= ^ X exp_i(i?(0) • (aR + Pl) + (/(i?(0) + fci) + fca) • ctl) 

fl(0) 

X ^ exp_i((L(0) + /(i?(0) + ki) + ^2) • Qfi 
L(0) 

+ /(L(0)+/(i?(0) + fci) + fc2)*/3L) (12) 

= ^ X exp_i((i?(0) + fci) • (Offi + Pl) + /(i?(0) + k{) • ul 

fl(0) 

+ fci • {aR + /3l) + fc2 • 0 !l) X \f{aL,PL) (13) 

n t I ^ I 7 ^ A/(a_R + /3L,aL)A/(aL,/3i) 

= exp_i(fci • (afi + /3 l) + fc2 • ol) X ^ — i- (14) 



C Proof of Lemma B 

We prove this lemma using Fig.^ 

2Prob[F(0) . a = F(3) • (/3i,ai)] - 1 (15) 

= ^ X exp_i(il(0)*aL + i?(0)*ai{ 

L(0),fl(0) 

+ (i?(0) + /(L(0) + /(i?(0) + fci) + fc2)) . Pl 
+ (L(0) + /(i?(0) + fci) 

+ /(i?(0) + /(L(0) + /(i?(0) + fci) + fc2) + fca)) • aL) (16) 

= ^ X exp_i(i?(0).(afl + /3i) + /(L(0) + /(i?(0) + fci) + fc2)./3i 

L{0),R{0) 
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Fig. 2. 3-round linear Fig. 3. 3-round linear 
probability with trivial probability with trivial 
3-rd round 2-nd round 



+ /(^(O) + ^l) • OIL 

+ /(i?(0) + /(L(0) + /(i?(0) + fci) + fca) + fcs) • ol) (17) 

= ^ exp_i(i?(0) • (ttfi -k /?l) + /(7?(0) -k ki) •UL + (i?(0) -k fca) • /3 l) 

fi(0) 

X ^ exp_i((i?(0) -I- /(A(0) -b /(i?(0) + fci) + fc 2 ) + fca) • /3 l 
L(0) 

+ /(i?(0) + /(L(0) + /(i?(0) + fci) + fca) -f fca) • ol) (18) 

= ^ ^ exp_i((i?(0) -b fci) •UR + f{R{0) + fci) • Oi -b fci • Ofl -b fca • /3 l) 

fi(0) 

xA/(/3i,aL) (since /: bijective) (19) 

/; ,7 1 , A/(afl, aL)A/(/3i, ol) 

= exp_i(fci*afl-bfc3*/3i) X (20) 



D Proof of LemmaQ 

We define transformed key K* = (fc*, fc^, . . . , fc*) corresponding to original key 
K = (fci,fc2,...,fcr). 

We assume changing (fca, kb) i— > {k*, fc^) (a: odd, 6: even). If we define 



k* = 



ki + Aka i- odd 
ki + Akb i: even 



= {Akb, Aka), and 1 / = 



Ek^Y{ 0) + fi) = Ek{Y{0)) + 1 / holds. 



{Aka, Akb) r: odd 
{Akb, Aka) r: even 
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E Proof of Corollary n 

It is sufficient to prove that differential probability satisfies Q . 

V/r, 1 / ^ Prob ^^[£l(y(0) + fi) + + E{Y*{0) + fi) + u = /3|Z\r(0) = a] (21) 
= ^ Prob ^^[i?(y(0) + p) + E(Y*(0) + p) = /3 

|(F(0) + M) + (y*(0) + p) = a] (22) 

= Prob [£l(y(0))+£;(r*(0)) =/3|F(0) + r*(0) = a] (23) 

yioi+M.viol+M 

= Prob [i;(y(0)) + i;(y*(0)) = /3|Z\y(0) = a] (24) 

y(o),y*(o) 

F Proof of Corollary H 

It is sufficient to prove that linear probability satisfies B . 

V^, (2 Prob[(F(0) + ^) . a = (£1(P(0)) + i^) • f3] - 1)^ (25) 

= (2 Prob[F(0) • a = £1(^(0)) •/?+ (p • a + i/ •/?)]- 1)^ (26) 

= (2 Prob[F(0) • a = £1(^(0)) • /3] - 1)^ (27) 



G Proof of Lemma 

Completeness Vfc € GF(2)"[a:+fc G 5(GF(2)"')] holds, and composite function 
of bijective functions is bijective. 

Reflexive law If fc = 0 G GF(2)", V/ G GF(2)"[/(a;) = f{x) + k] holds. So, 

f^f- 

Symmetric law If / ^ g, 3fc G GF(2)”[/(a:) = g{x) + fc] holds. Thus, g{x) = 
f{x) + k holds, i.e. g ^ f- 

Transitive law If / ^ t; and g ^ h holds, since G GF(2)"[/(a;) = g{x) + k\ 
and 3/ G GF(2)”[g(a;) = h{x) + /] holds, then f{x) = g{x) + k = {h{x) + /) + 
k — h{x) + {I + k) holds. So, since I + k G GF(2)” holds, f ^ h holds. 



H Proof of Lemma^3 

We define i? = {/ G 5(GF(2)”)|/(a;o) = yo}- We prove that i? is a set of 
representatives. Let f,gGR and assume C(/) = C{g). In this case, f ^ g holds, 
and given the definition of i?, f{xo) = g{xo) = yo holds. Thus, f = g holds since 
fix) = gix) + 0. 

We prove C{f) = {f{x) + k\k G GF(2)"}. g G {/(a;) + k\k G GF(2)"} 

3k G GF(2)"[5(a.) = f{x) + k]^g^f^gG C(/). 
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S'(GF(2”)) D C(f) holds. On the other hand, since ^S'(GF(2”)) = 2”! 
f&R 

holds and #( |J C(/)) = *R x *C{f) = (2" - 1)! x 2" = 2"! holds, so 
/efl 

5(GF(2")) = G(/) holds. That is, i? is a complete set of representatives 

/efl 

in 5(GF(2)")/~. 
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Abstract. RC4, a stream cipher designed by Rivest for RSA Data Se- 
curity Inc., has found several commercial applications, but little public 
analysis has been done to date. In this paper, alleged RC4 (hereafter 
called RC4) is described and existing analysis outlined. The properties 
of RC4, and in particular its cycle structure, are discussed. Several vari- 
ants of a basic “tracking” attack are described, and we provide exper- 
imental results on their success for scaled-down versions of RC4. This 
analysis shows that, although the full-size RC4 remains secure against 
known attacks, keystreams are distinguishable from randomly generated 
bit streams, and the RC4 key can be recovered if a significant fraction of 
the full cycle of keystream bits is generated (while recognizing that for 
a full-size system, the cycle length is too large for this to be practical). 
The tracking attacks discussed provide a significant improvement over 
the exhaustive search of the full RC4 keyspace. For example, the state of 
a 5 bit RC4-like cipher can be obtained from a portion of the keystream 
using 2“*^ steps, while the nominal keyspace of the system is More 
work is necessary to improve these attacks in the case where a reduced 
keyspace is used. 



1 Introduction 

Stream ciphers are often used in applications where high speed and low delay 
are a requirement. Although many stream ciphers are based on linear feedback 
shift registers, the need for software-oriented stream ciphers has lead to several 
alternative proposals. One of the more promising algorithms, RC4|, designed 
by R. Rivest for RSA Data Security Inc., has been incorporated into many 
commercial products including ESAFE and Lotus Notes, and is being considered 
in upcoming standards such as TLSH- 

In this paper, the RC^ algorithm is described and known attacks reviewed. 
A detailed discussion of “tracking” attacks is provided and estimates of the 
complexity of cryptanalysis for simplified versions of RC4 are given. 

^ While RC4 remains a trade secret of RSA Data Security Inc., the algorithm de- 
scribed in is believed to be output-compatible with RC4. This paper discusses 
the algorithm given in and is referred to as RC4 for convenience. 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 1999. 

© Springer-Verlag Berlin Heidelberg 1999 
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2 Background 

The following description of RC4 is based on that given in It generalizes 
RC4 to use n-bit words, but n = 8 is the most commonly used value. To use 
RC4, a key is first used to initialize the 2" word s-box S and counters i and 
j through Algorithm ^ The keystream K. is then generated using Algorithm ^ 
The s-box entries and the counters i and j are n-bit words. 

Algorithm 1 (RC4 Initialization). Let ko...ki-i denote the user’s key, a 
set of I n-bit words. 

1. For z from 0 to 2" — 1 
(Oj) Set = kz mod I ■ 

2. For z from 0 to 2" — 1 
(a) Set Sz = 2 . 

3. Set j = 0. 

4- For i from 0 to 2"' — 1 

(a) Set j = j Si Ki mod 2". 

(b) Swap Si and Sj. 

5. Set i = 0 and j = 0. 

Algorithm 2 (Keystream Generation). 

1. Set i = i-\-l mod 2”. 

2. Set j = j -\- Si mod 2"'. 

3. Swap Si and Sj. 

4 . Output Ssi+Sj mod 2" cLs the next word in the keystream. 

The RC4 keystream generation algorithm is depicted in Fig.^ 

The initialization algorithm is a key-dependent variant of the keystream gen- 
eration algorithm, and is used to initialize the s-box 5 to a “randomly chosen” 
permutation. The nominal key length could be up to n • 2"' bits, but since it is 
used to generate only a permutation of 2" values, the entropy provided by the 
key can be at most log 2 ( 2 "'!) bits, which will be referred to as the effective key 
length. Tablejshows the nominal and effective key lengths for different values of 
n. In the remainder of this paper, the mod 2" is sometimes omitted for brevity. 

3 Published Results 

This section is based on Q. 



3.1 A Class of Weak Keys 

In 1995, Andrew Roos posted a paper to the sci. crypt newsgroupJJ describing a 
class of weak keys, for which the initial byte of the keystream is highly correlated 
with the first few key bytes. The weak keys are those satisfying 



fco + fci = 0 mod 2" . 
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So 






Si 




/ \ 

Keystream Generation 




S 2 




1 . Increment i by 1 









2. Increment j by S; 

3. Swap Si and Sj 




S^-2 


j 


4. Output S.^„ 

J 




S2"-l 





Fig. 1. RC4 Keystream Generation 



Table 1. Nominal and Effective Key Sizes for RC4-n 



RC4 Word Size 


Nominal Key Length (bits) 


Effective Key Length (bits) 


2 


8 


4.58 


3 


24 


15.30 


4 


64 


44.25 


5 


160 


117.66 


6 


384 


296.00 


7 


896 


716.16 


8 


2048 


1684.00 


9 


4608 


3875.17 



The weak keys occur because the keystream initialization algorithm swaps a 
given entry of the s-box exactly once (corresponding to when the pointer i points 
to the entry) with probability 1/e. In addition, for low values of i, it is likely 
that Sj = j during the initialization. The reduction in search effort from this 
attack is 2®-^, but if linearly related session keys are used, the reduction in effort 
increases to 2^®. 
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3.2 Linear Statistical Weaknesses in RC4 

In the author derives a linear model of RC4 using the linear sequential circuit 
approximation (LSCA) method. The model has correlation coefficient 15 • 2“^", 
and requires 64"/225 keystream words. The model is successful in part because 
the s-box evolves slowly. 



3.3 A Set of Short Cycles 

Suppose that i = a, j = a+1, and Sq+i = 1 for some a. Then, after one iteration, 
i = a+ l, j = a + 2, and Sa +2 = 1- Thus, the original relationship is preserved. 
Each such cycle has length 2" • (2" — 1), and (2" — 2)! such cycles exist. Note 
however that, because RC4 is initialized to i = j = 0, these cycles never occur 
in practice. These observations were first made in Q and outlined in Q. 

4 Cycle Structures in RC4 

4.1 Comparison with Randomly Chosen Invertible Mappings 

The state of RC4 is fully determined by the two n-bit counters i and j and the 
s-box S. Since the number of states is finite, it must ultimately be periodic as 
the keystream generation function is iterated. Because the keystream generation 
function is invertible, the sequence of states is periodic. The length of the period 
depends on the word size n and the particular starting state, as illustrated in 
TableHfor n = 2 and n = 3. For each period, the number of distinct cycles of that 
period is listed, followed by the number of initial states in each cycle, expressed 
as a formal sum. The last three columns will be explained in the next section. 
For comparison. Fig. flplots the expected cycle lengths for a randomly chosen 
permutation and those observed for RC4. For the randomly chosen permutation, 
the minimum and maximum lengths observed for the fcth longest cycle in a set 
of 1500 arbitrarily chosen permutations is plotted. 

It has also been observed that RC4 keystream sequences are slightly biased^. 
Define the gap at i for a sequence s to be the smallest integer t > 0 such that 
Si = Si-t-i- For a random sequence in which each element takes on one of 2" 
values, the probability that t = k is given by 

/2"-iy 1 

V 2" ) ' ^ ' 

TableH shows the ratio of the actual to the expected gap probability, based on 
a sample of approximately 2^^^ elements of an arbitrarily chosen RC4 keystream. 
For all values of n, gaps of length 0 are more likely than expected, and gaps 
of length 1 are less likely than expected. In support of this, it has also been 
observed that the probability that = 0 is lower than expected and that the 
probability that = 2" — 1 is higher than expected after a gap of length 0. 



648 1 8 1 81 {0,1, 2, 3, 4, 5, 6, 7} 

472 2 7+ 7 = 14 6 118 (0, 2, 4, 6}, {0, 2, 4, 6} 

456 1 12 1 57 10,1,2,3,4,5,6,7} 

264 2 5 + 7 = 12 2 66 (0, 2, 4, 6}, {0, 2, 4, 6} 

120 2 2 + 2 = 4 7 15 (0, 1, 2, 3, 4, 5, 6, 7}, 
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Table 2. Possible Periods for RC4 with Word Length 2 and 3 

























Cycle Length 
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1e+07 
1 e+06 I- ] 
100000 - 
10000 - 
1000 - 
100 - 
10 - 
1 

0.1 - 
0.01 - 
0.001 



Cycle Lengths of Random Permutations Compared with RC4-3 



Random Permutation i 
RC4-3 



15 20 25 

Cycle Number (sorted by length) 



Fig. 2. Comparison of Cycle Lengths for RC4 and Random Permutations 



Table 3. Deviation of RC4 Gap Lengths from those of Random Keystream 



n 








Gap 










0 


1 


2 


3 


4 


5 


6 


2 


1.04082 


0.952381 


0.834467 


0.870748 


0.902998 


1.72 


1.26133 


3 


1.01828 


0.956577 


1.01042 


0.994535 


1.02179 


1.00909 


1.00284 


4 


1.00365 


0.993622 


1.0009 


1.00126 


1.00276 


1.00039 


1.00059 


5 


1.00099 


0.99859 


0.999946 


1.0009 


1.00081 


1.00024 


1.00046 


6 


0.999762 


0.999901 


1.00024 


1.00039 


1.00036 


1.00014 


0.999714 
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4.2 Partitioning of RC4 Cycles 

As observed independently in Q, individual RC4 cycles can be partitioned into 
pieces of “equivalent” subsets, as follows. Define by the s-box obtained by 
rotating the s-box entries to the right (or down) by d (formally, S' = if 
= St-d mod 2 ")- Let the right shift by d of an RC4 state (z, j, S) be defined by 
{i + d,j + d, The following theorem holds: 



Theorem 1. Suppose that, for a given key, an RC4~n system goes through the 
state {i' , f , S') and that the cycle length for this key is T. Then any cycle going 
through one or more states of the form {i' +d, j' +d, S'^'^) (where d is an integer) 
has period T and the shift relationship between the states is maintained as the 
two systems evolve. In addition, if the output sequences are compared word for 
word as the systems evolve beyond those states, the outputs will always differ if 
d^ 0 mod 2"' and will always agree otherwise. 



Proof. Compare the state evolutions of the two systems {i',j',S') and {i" = 
i' + d, j” = f + d, S” = S'^'^). The steps for one round of keystream generation 
are: 

1. Set i" = i" + I mod 2". 

2. Set j" = j" + S'f, mod 2^^. 

3. Swap 5", and S'-„. 

4. Output S'gn ,g„ 2 " the next word in the keystream. 

i" 



or 



1. Set t" = r + lmod2". 

2. Set f = j" + mod 2". 

3. Swap S'/„ and 5"„. 

4. Output 

keystream. 



as the next word in the 



which becomes: 



1. Set z" = z' -I- d -t- 1 mod 2”. 

2. Set j" = j' + d+ S'i, mod 2". 

3. Swap 5';+^ and 5",+^. 

4. Output S'g, _|_g, 2 " the next word in the keystream. 



Thus, the shift relationship between the two systems is preserved, and only the 
output is different provided d ^ 0 mod 2". Because the systems are identical 
except for the output, the periods of the two systems must also be the same. □ 

Consider a cycle of period T, and an arbitrary RC4 state {i' ,f , S') in that 
cycle. Then all shifts of this state belong to a cycle of length T. If there are only 
a few cycles of length T (as will be the case if T is large), then more than one 
may appear in the same cycle. The following theorem holds: 
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Theorem 2 (Cycle Partitioning). Let j be a cycle of period T and let D = 
{%' ,j' , S') be any state in the cycle. Let d\, . . dk-i (k < 2"^ be the right shifts of 
D in the order they appear as RC4 evolves from state D (do — 0 is understood) . 
Then the distance (expressed as the number of encryptions) between successive 
shifts is given by Tjk, and for any other state, D' , in the same cycle, the right 
shifts of D' di, , dk-i are the only right shifts of D' appearing in the cycle, 
and appear in that order. Figure^ illustrates this partitioning, with a = T/k 
and k = A. 

Proof. Denote by Dt the right shift by dt of D, and by D{s) the RC4 state 
obtained by performing s encryptions starting at state D. Let I be the greatest 
distance between two consecutive shifts and denote the corresponding shifts da 
and da+i mod k- Suppose that for some b, the distance s between db and db+i mod k 
was smaller than 1. By Theorem^ Da remains a right shift of Db as the systems 
evolve. Since Db{s) is a right shift of Db, Da{s) must be a right shift of Da. 
Thus, the distance between Da and Da+i mod k is less than or equal to s. But 
that distance is I, contradicting the assumption that s < 1. Therefore, no smaller 
distance exists, and the shifts are at equal distances from each other. The second 
part of the theorem follows by TheoremHand the fact that any state in the cycle 
can be obtained by repeated encryption starting at any state D. □ 

In fact, only certain orderings of the shifts present in a given cycle are possible 
because Theorem ^implies that di+i — di is constant in a cycle. d\ must then 
be a generator for the shifts in the cycle, and di = i ■ d\ mod 2". The last three 
columns of TableHconfirm this statement for RC4-2 and RC4-3. In this table, 
the “Shift Generator” entry is the value of di, and the entry “Offset” is the 
distance between successive shifts. Finally, the “Shifts Found” table enumerates 
the right shifts of the initial state found in each cycle. All of these results were 
obtained experimentally. Note that in all cases, T/k — Offset as required by 
the theorem. For example, for the cycles of length 472, a distance of 118 exists 
between shifts, the shifts appear in the order {0,6,4, 2}, and 472/4 = 118. The 
entry {0} in the “Shifts Found” column indicates that no shifts of the initial 
state appear in the cycle. 

5 Tracking Analysis 

Algorithm H below outlines a basic attack which can be mounted against RC4. 
In essence, the algorithm keeps track of all states which RC4 could be in, given 
that a particular keystream has been generated. 
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Fig. 3. Partitioning of an RC4 Cycle 



Algorithm 3 (Forward Tracking). 

1. Mark all entries St as unassigned. 

2. Set i = 0, j = 0, and z = 0. 

3. Repeat: 

(a) Set i = i + 1 mod 2"'. 

(b) If Si is unassigned, continue with the remainder of the algorithm for each 
possible assignment of Si . 

(c) Set j = j + Si mod 2”. 

(d) If Sj is unassigned, continue with the remainder of the algorithm for 
each possible assignment of Sj . 

(e) Swap Si and Sj. 

(f) Set t= Si + Sj mod 2"'. 

(g) If St is unassigned and ICz does not yet appear in the s-box, set St = ICz. 

(h) If St ^ ICz, the state information is incorrect. Terminate this round. 

(i) Increment z. 

(j) If z is equal to the length of the key stream, output the current state as a 
solution and terminate the run. 

The forward tracking algorithm is illustrated in Fig.H In this diagram, the ob- 
served keystream is the all zero sequence, and the system is RC4-2. Figure J 
shows the number of nodes visited by the tracking algorithm as a function 
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of depth for various word sizes n. Two cases are considered for the observed 
keystream; an arbitrarily chosen nonzero keystream, and a zero keystream. The 
zero keystream can be analysed more quickly than a more general keystream. 
To obtain approximate data for the n = 5 case in Fig.^ at a given depth the 
depth-first search was only carried out for selected nodes. The total number of 
nodes visited was then calculated assuming that the number of nodes in each 
subtree would be the same for all nodes. Similar work has been done by Luke 
O’Connor0. 



Depth 



Observed 

Keystream 



Tree Diagram 



Depth 0 



None 



(0,0, {?,?,?,?}) 



All initial states can generate the keystream of length 
0 correctly. 

1 node, 24 total states 



Depth 1 : 0 : (1,2,{?,1,2,0)) (U,{?,1,0,?|) (1,3,{0,I,?,3)) 3 nodes, 4 total states 









1 



Depth 2 0, 0 



(2,1,{?,0,1,?)) 



1 node, 2 total states 



Depth 3 



0 , 0,0 



(3,0, {3, 0,1, 2}) 



1 node, 1 total states 









A unique solution has been found 












Depth 4 


0, 0, 0, 0 


(0,3, {2, 0,1,3}) 


1 node, 1 total states 






1 





Fig. 4. Forward Tracking Algorithm for n = 2, and a 0 Keystream of Length 4 



Several variations of this algorithm are possible. Backtracking, in which the 
keystream is processed in reverse, appears to be easier to implement efficiently 
because s-box entries are fixed sooner. Probabilistic variants, which use trun- 
cated tracking analysis or other information to determine the “best” node to 
follow in the tracking analysis, may be able to analyse more keystream, avoiding 
keystreams which result in a more difficult search. These variants are discussed 
inO 

The performance of these algorithms can be used to provide an upper bound 
on the complexity of RC4 cryptanalysis. Table ^shows the attack complexity 
observed to date. All of the results are based on backtracking, except the nonzero 
keystream n = 5 entry, based on an estimate from a probabilistic backtracking 
attack. 
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Fig. 5. Number of Nodes During Forward Tracking 



6 An Attack on a Weakened Version of RC4 

Suppose that RC4 was modified by replacing its initialization function with the 
following: 

Algorithm 4 (Weak RC4 Initialization). Let ko...ki-i denote the user’s 
key. 

1. Calculate 7 such that log2 (7!) = 8 • b 

2. For z from 0 to 2” — 1 
(a) Set Kz — kz mod i ■ 

3. For z from 0 to 2" — 1 
(a) Set Sz = z. 

4- Set j = 0. 

5. For i from 0 to 2" — 1 

(a) Set j = j + Si + Ki mod 2". 

(b) Swap Si mod 7 and Sj mod 7’ 

6. Set i = 0 and j = 0. 

The system still has 8d bits of entropy. However, because the tracking analysis 
can easily be confined to searching the reduced keyspace, it is likely to succeed 



Number of Nodes Which Can Generate A Keystream Of A Given Length 
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Table 4. Estimated Upper Bound on the Complexity of Cryptanalysis of RC4-n 



RC4 

Word Size 


Nominal 
Key Space 


Effective 

Keyspace 


Attack 
Complexity 
(arbitrary keystream) 


Attack 
Complexity 
(zero keystream) 


2 


2® 


24.58 


2'‘ 


2® 


3 


224 


215.30 


2® 


2^ 


4 


264 


244.25 


220 


2®" 


5 


2160 


2 II 7.66 


269 


242 


6 


2384 


2296.00 


? 


? 


7 


2896 


2716.16 


? 


? 


8 


22048 


21684.00 


? 


? 



very quickly even for the full-size cipher. Tablensummarizes the performance of 
the tracking attack for this weakened variant of RC4. TableHwas obtained by 
performing a tracking attack on 20 keystreams generated with randomly chosen 
keys for each value of 7. The maximum observed complexity is reported. This 
result shows that RC4 depends heavily on its key schedule for its security. The 
attack complexity indicated in TableHis not monotonically increasing because it 
is often the case that the attack succeeds in substantially less than the maximum 
number of steps. 



Table 5. Tracking Attack Complexity for RC4 with a Weakened Key Schedule 
(n = 8) 



7 


Effective Keyspace 


Attack Complexity 


15 


240 


214 


20 


261 


219 


25 


2®3 


223 


30 


2107 


2®^ 


31 


2112 


228 


32 


2117 


226 


33 


2122 


226 
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7 Conclusion 

RC4 remains a secure cipher for practical applications. Several theoretical at- 
tacks exist but none have been successful against commonly used key lengths. 
Nonetheless, tracking analysis does substantially reduce the complexity of crypt- 
analysis compared to the maximum key length which could be specified. Tracking 
analysis would show promise if it were possible to use knowledge of the actual 
key length to limit the state space to be searched. In this regard, RC4’s resilience 
is mainly due to the fact that the key schedule effectively prevents partial knowl- 
edge of the s-box state from providing information about the key. If this were 
not the case, tracking analysis would be successful even for the full-size cipher. 
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Abstract. Traceability schemes for broadcast encryption are defined by 
Chor, Fiat and Naor in Q to protect against a possible coalition of users 
producing an illegal decryption key. Their scheme was then generalized 
by Stinson and Wei in These schemes assume that every user can 
decrypt the secret value. In this paper we discuss key preassigned trace- 
ability schemes, in which only the users in a specified privileged subset 
can decrypt. A new scheme is presented in this paper, which has bet- 
ter traceability than previous schemes. We also present a new threshold 
traceability scheme by using ramp scheme. All the constructions are ex- 
plicit and could be implemented easily. 

Keywords: key preassigned scheme, broadcast encryption, traceability, 
secret sharing schemes, combinatorial designs. 



1 Introduction 

Most networks can be thought of as broadcast networks, in that any one con- 
nected to the network can access to all the information that flows through it. In 
many situations, such as a pay-per-view television broadcast, the data is only 
available to authorized users. To prevent an unauthorized user from accessing 
the data, the trusted authority (TA) will encrypt the data and give the au- 
thorized users keys to decrypt it. Some unauthorized users might obtain some 
decryption keys from a group of one or more authorized users (called traitors). 
Then the unauthorized users can decrypt data that they are not entitled to. To 
prevent this, Chor, Fiat and Naor | devised a traitor tracing scheme, called a 
traceability scheme, which will reveal at least one traitor on the confiscation of 
a pirate decoder. This scheme was then generalized by Stinson and Wei in Q. 
There are some other recent papers discussing this topic (see 

The basic idea of a traceability scheme is as follows. Suppose there are a total 
of h users. The TA generates a set T oi v base keys and assigns £ keys chosen 
from T to each user. These £ keys comprise a user’s personal key, and we will 
denote the personal key for user i hy Ui. A broadcast message, M, consists of an 
enabling block, B, and a cipher block, Y. The cipher block is the encryption of 
the actual plaintext data X using a secret key, S. That is, Y = es{X), where e(-) 
is the encryption function for some cryptosystem. The enabling block consists of 
data which is encrypted by some method, using some or all of the v keys in 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 144-^^| 1999. 

(c) Springer-Verlag Berlin Heidelberg 1999 



Key Preassigned Traceability Schemes for Broadcast Encryption 145 



the base set, the decryption of which will allow the recovery of the secret key 
S. Every authorized user should be able to recover S using his or her personal 
key, and then decrypt the cipher block using S to obtain the plaintext data, i.e., 
X — ds{Y), where d{-) is the decryption function for the cryptosystem. 

Some traitors may conspire and give an unauthorized user a pirate decoder, 
E. E will consist of a subset of base keys such that E C Ui^cUi, where C is the 
coalition of traitors. An unauthorized user may be able to decrypt the enabling 
block using a pirate decoder. The goal of the TA is to assign keys to the users 
in such a way that when a pirate decoder is captured and the keys it possesses 
are examined, it should be possible to detect at least one traitor in the coalition 
C, provided that \C\< c (where c is a predetermined threshold). 

In all the traceability schemes discussed in it is assumed that 

every user can decrypt the enabling block. This means that the data supplier 
should assign the keys after he or she has determined who the authorized users 
are. In practice, however, this restriction may be inconvenient, as changes be- 
tween authorized and unauthorized users may be frequent. 

In this paper, we investigate traceability schemes in which the personal keys 
can be assigned before the authorized users are determined. We will call these 
schemes key preassigned schemes. Key preassigned schemes (for broadcast en- 
cryption) have been discussed by several researchers. The first scheme was in- 
troduced by Berkovits in Several recent papers have studied broadcast en- 
cryption schemes (see for example). Broadcast schemes enable 

a TA to broadcast a message to the users in a network so that a certain speci- 
fied subset of authorized users can decrypt it. However, most of these broadcast 
schemes have not considered the question of traceability. We will briefly review 
the traceability of these schemes and then give some key preassigned schemes 
which have better traceability than the previous schemes. We will also discuss 
threshold tracing schemes which are more efficient but less secure in some re- 
spect. We will use combinatorial methods to describe the schemes and give some 
explicit constructions. The efficiency of the schemes is measured by considering 
the information rate and broadcast information rate. 

There are two aspects of security in our schemes. One property of the scheme 
is to prevent unauthorized users from decrypting the enabling block; this is the 
usual question investigated in broadcast encryption. The second property is the 
ability of tracing a pirate decoder which is made by a coalition of users (which of 
course could be authorized users). Although these two properties both protect 
against coalitions, they have different effects. The first property can prevent the 
coalition of unauthorized users from decrypting the enabling block, but it does 
not protect against construction of a pirate decoder. The second property cannot 
prevent a coalition from decrypting the enabling block, but it enables the TA to 
trace at least one traitor if the decoder is found. 

We will discuss unconditionally secure (in an information theoretic sense) 
schemes. These schemes do not depend on any computational assumption. 
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2 Definitions and notations 

In this section, we give basic definitions and the notations used in this paper. 

2.1 Broadcast encryption schemes 

The definition of a broadcast encryption scheme we use in this paper will be 
the same as the one given in Q. As in a traceability scheme, there is a trusted 
authority (TA) and a set of users U = {1,2, •••,6}, and the TA generates a 
set of V base keys and assigns a subset of the base keys to each user as his or 
her personal key. At a later time, a privileged subset, P, of authorized users is 
determined. The TA chooses a secret key S and broadcasts an enabling block 
Bp (which is an encryption of S) that can be decrypted by every authorized 
user, but which cannot be decrypted by certain forbidden subsets disjoint from 
P. 

Let V denote the collection of possible privileged subsets and let P denote 
the collection of possible forbidden subsets. In this paper, we will consider the 
case when V — 2^ , so V contains all subsets of users, and T contains all /- 
subsets of users, where / is a fixed integer. To make things simpler (and since 
we want to focus on the traceability first), we will mainly consider the situation 
when / = 1. In the case V = 2^ and / = 1, the privileged subset can be chosen 
to be any subset of users, and the enabling block cannot be decrypted by an 
individual unauthorized user. (It may be possible for subsets of unauthorized 
users to jointly decrypt the message, however.) 

For 1 < i < 5, let Ui denote the set of all possible subsets of base keys that 
might be distributed to user i by the TA. Thus the personal key Ui G U^. Let 
S denote the set of possible secret keys, so S' G S. Let Bp be the set of possible 
enabling blocks for privileged subset P; thus Bp G Bp. Usually, Ui, S and Bp 
consist of tuples from a finite field F^. We define the information rate to be 



In general, to decrease the size of the broadcast, i.e., to increase pp, it is 
necessary to decrease p, and vice versa. Since it is trivial to construct a broadcast 
encryption scheme with p = 1 and pp = 1 /b, we are mainly interested in schemes 
with pb > 1/^. 

2.2 Traceability 

Suppose a “pirate decoder” E is found. (We assume that the pirate decoder 
can be used to decrypt some enabling blocks.) If there exists a user i such that 
\EnUi\ > \EnUj \ for all users j ^ i, then i is defined to be an exposed user. A 
c-traceability scheme is defined as follows. 




and the broadcast information rate to be 
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Definition 21 Suppose any exposed user i is a member of the coalition C when- 
ever a pirate decoder E is produced by C {so E C Ui^cUi) and \C\ < c. Then 
the scheme is called a c-traceability scheme. 

When a scheme is c-traceable, V = 2^ , and the forbidden subsets consist 
of all /-subsets of users, we call it a {c,f)-key preassigned traceability scheme 
and denote it as a (c, /)-KPTS. For the case / = 1, we denote the scheme as a 
c-KPTS. 

Remark. The difference between Definition^] and the one in ^] is that the 
size of the pirate decoder is not specified here. For example, the pirate decoder 
might be smaller or larger than a legitimate decoder. The only requirement is 
that a pirate decoder should be able to decode some enabling blocks. 

A set system is a pair {X, A), where A is a set of points and A is a collection 
of subsets of X called blocks. We will use set systems with the following property, 
which is modified from ^] Theorem 2.2]. 



Definition 22 A traceability scheme system is a set system (A, A), where every 
block has size k for some integer k, with the property that for every choice of 
c' < c blocks Ai, A 2 , • ■ • , Ac' € A, and for any t-subset E C where t> k, 

there does not exist a block A G A\{Ai, A 2 , • ■ • , Ac'} such that \EC\Aj\ < |FnA| 
for 1 < j < c! . Such a system will be denoted by (c, fc)-TSS. 

In this definition, the blocks correspond to legitimate decoders and E corre- 
sponds to a pirate decoder. We will be able to assume that \E\> k due to the 
encryption scheme we use. 



2.3 Secret sharing schemes 

Let U be the set of b users, T C 2^ be a set of subsets called authorized subsets, 
and let Z\ C 2^ be a set of subsets called unauthorized subsets. In a {E, A) -secret 
sharing scheme, the TA has a secret value K. The TA will distribute secret 
information called shares to each user of U in such a way that any authorized 
subset can compute K from the shares they jointly hold, but no unauthorized 
subset has any information about K. The paper ^] contains an introduction to 
secret sharing schemes. 

Let r < t < b. An {r,t,b)-ramp scheme is a secret sharing scheme in which 
the authorized subsets are all the subsets of U with cardinality at least t and the 
unauthorized subsets are all the subsets of U with cardinality at most r. When 
r = t — 1, the ramp scheme becomes a threshold scheme which is denoted by 
{t, 6)-threshold scheme. The Shamir scheme provides a construction of a {t, b)- 
threshold scheme in which each share is an element of and the secret is also 
an element of , for any prime power q > b + 1. 
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2.4 Key predistribution schemes 

Fiat-Naor key predistribution sehemes (or KPS) (see B) are used in the KIO 
construction for broadcast encryption schemes given in Let / < 5 be an 
integer. The forbidden subsets consist of all subsets of size at most /. In a Fiat- 
Naor scheme, the TA chooses a secret value xp for each possible forbidden subset 
F, and gives that value to each user in U\F. Let P CU. The value 

Kp= ^ xf 

Fnp=$ 

is the key for the privileged subset P. Kp can be computed by any member of P, 
but Kp cannot be computed by any forbidden subset F disjoint from P (where 

3 Traceability of previous broadcast schemes 

Since key preassigned broadcast encryption schemes were proposed in several 
constructions have been given. A summary of these results can be found in 
Stinson Q. In Q, the KIO construction is described, and which is further 
discussed in Q. We will not review these schemes here — we only wish to 
indicate that these schemes usually do not have any traceability, or have, at 
most, 1-traceability. (However, note that if in a scheme, every user has disjoint 
keys, then the scheme is “totally traceable” . Thus the trivial scheme in has 
6-traceability. ) 

Staddon first discussed the traceability of key preassigned broadcast schemes 
in her PhD thesis She constructed some schemes called “OR protocols” 
that have higher traceability. We briefly review the OR protocols now. In OR 
protocols, the size of a forbidden subset is / and the size of the privileged subset 
is w = b — f. These values are fixed ahead of time. The TA produces a key Kt 
for each subset P* of U, where \Pt \ = [^] , and gives that key to every user in Pj, 
where n is a given positive integer. When the TA wants to broadcast an enabling 
block for a privileged subset P, he uses the n keys in the set 

Cp = {Kt :PtCP} 



to encrypt it, in such a way that any user who has at least one of these n keys 
is able to decrypt it. 

It is shown in that the OR protocol construction has 6*(-\/n)-traceability 
for n > 2 and 6 sufficiently large relative to n and /. However, the proof is based 
on the assumption that the pirate decoder always is the same size as a personal 
key, i.e., that it always contains 

( 

keys. This assumption may not be practical. In fact, unauthorized users who 
possess even one key might be able to decrypt the enabling block if the key 
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happened to belong to the set Lp. Thus the OR protocol has no traceability if 
we consider the traceability under Definition^] where we allow a pirate decoder 
to have fewer keys than a personal key. 

The traceability schemes in ^^] have the desirable property that any pos- 
sible decoder must consist of the keys from the base key set, otherwise they will 
be useless for decoding. In some other proposed schemes, an enabling block can 
be decrypted using keys not in the base set. In such a scheme, the traceability 
property is defeated. We describe the traceability scheme proposed in ^] to 
illustrate this point. 

In the scheme of ^] (which is not key preassigned), the TA chooses a random 
polynomial 

j{x) = uo -b a\x -b aix^ -b • — b acx'^ . 

The TA then computes /(i) and gives it to user i secretly, so that the personal 
key of user i will be (i, /(i)). When TA wants to encrypt the secret key S', he 
broadcasts the enabling block (S-boo, oi, 02 , Oc). If a pirate decoder contains 
a pair (u, f{u)), then u will be the exposed user. However, two users i and j can 
construct a pirate decoder as follows. They choose two random non-zero numbers 
a and f3 and compute the following: 

af{i) + /3f{j) ai + /3j -b Pf 

‘ a + P ’ a + P ’ ’ a + P 

Since 

ao =bo - aibi Qcbc, 

the (c -b l)-tuple {bo,...,bc) can be used as a decoder. In this scenario, the 
traitors i and j cannot be exposed by the usual traitor tracing method. 

4 The new scheme 

In this section, we present our traceability schemes which will use a KIO type 
construction. The basic idea of the KIO construction is that the secret key is split 
into shares, using a threshold scheme (or a ramp scheme), and then the shares 
are encrypted, thus forming the enabling block. Our scheme is a key preassigned 
broadcast encryption scheme where U = 6}, P = 2^ and T consists of 

all /-subsets of U. We consider the case / = 1 first. 

Suppose (A, A) is a (c, fc)-TSS, where A = {1, 2, • • • , u} and A = {Ai, A 2 , • ■ • , 
A),}. The block Aj determines the personal key given to user j, for 1 < j < 6. 
For each u G X, let 

Ru — {j G lA \ u G Aj }. 

The main steps in the protocol are as follows: 

1. For every set as defined above, the TA constructs a Fiat-Naor key pre- 
distribution scheme on user set i?„, with = {{j} : j G i?„} U {0} and 
Pu — 2^". Thus, for each u, 1 < u < v, the TA chooses |i?„| -b 1 secret 
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values, denoted and {j G i?„). These values are chosen at random 
from a finite field F^. The value xr^ is given to each i G Ru and xr^^j is 
given to each i G These keys form the personal key for user i. 

We will assume the existence of a function I ndex on the set of base keys such 
that lndex(a;) = j if x is a key from the jth Fiat-Naor scheme. These keys 
might be stored as pairs, e.g., (xr^,u) and (xr^j,u), so that the users know 
which keys are from which Fiat-Naor scheme. 

2. Suppose the TA wants to encrypt the secret key S' G F, for a privileged 

subset P. For the purposes of illustration, suppose P — {1, 2, • • • , re}. The 
TA first uses a (fc, n)-threshold scheme to split S into n shares j/i, j/ 2 , • ' G J/m 
where Ar = and n = \Ar\ (note that n < u, so a (fc, ^(-threshold 

scheme can be used here, if desired) . 

3. For each j G Ap, the TA computes the secret key Kj of Fiat-Naor scheme 
on Rj for the privileged subset Rj n P, i.e., 

Kj = xr^+ ^ XR^^i. 

i&Rj\P 

4. Each share yj is encrypted using an encryption function e(-) with key Kj. 
The enabling block consists of the list of encrypted values 



■■ j G Ap). 

Since each user in P has k values in Ap, he can compute k keys Ki„ , Ki^ , • • • , 
Kif. and then obtain k shares, j/ij , j/is G ' G Uik ■ Using the reconstruction function 
of the threshold scheme, the user is able to recover the value of the secret key, 

5. 

A user not in P cannot compute any of the keys Ki, since the Fiat-Naor 
scheme is secure against individual unauthorized users. Thus, the user cannot 
get any information about the n shares. 

Now we consider traceability. Suppose a pirate decoder E is found. The TA 
can compute the Index of the decoder as 



Index(P) = {Index(x) : x G E}. 

Note that the cardinality of the set Index(P) is at least k, otherwise the decoder 
will be useless. The TA can then use this Index to find an exposed user, since 
the set system (X,A) is a (c, fc)-TSS. 

The information rate of this scheme is 

1 

P=k~r 

where is the number of blocks containing x, i.e., = |Pa;|) and r = max{ra, : 

X G X}. The broadcast information rate is 
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The following theorem summarizes the properties of the scheme. 

Theorem 41 Suppose (X,A) is a {c,k)~TSS in which |X| = v and |yl| = b. 
Then there is a c-KPTS for a set ofb users, having information rate p > l/(fcr) 
and broadcast information rate ps > 1 /v. 

Remark. For the case / > 1, we need only change the construction of the Fiat- 
Naor scheme on each i?„ so that the possible forbidden subsets are all subsets 
of Ru having size at most /. This will cause the information rate of the scheme 
to decrease, while the broadcast information rate remains the same. 

The following small example will illustrate the scheme. 

Example 41 A 2-KPTS with 82 users. 

Let X = {0, 1, • • • , 40} and suppose A contains the following 82 blocks, where 
the calculations are in Z 41 , for i = 0,l,2,--',40: 

Ai = {1 + i, 10 + z, 18 + i, 16 + f, 37 + i} 

Aii+i = {36 + f, 32 + i, 33 + f, 2 + i, 20 + i} 

The set system {X,A) is a (41,5, l)-balanced incomplete block design (see ^). 
This set system has the property that each pair of points appears in exactly one 
block, and every point appears in exactly 10 blocks. It is in fact a (2, 5)-TSS (see 
Theorem ^ 3 . 

The block At is associated with user i. For each u G X, the TA constructs a 
Fiat-Naor scheme on i?„. For example, for u = 1, it can be seen that 

Ri = {0,32,24,26,5,47,51,50,81,63}, 

so |i?i| = 11. The TA will choose 11 secret values in for some prime power 
q, and every user in Ri will receive 10 of the 11 values. A Fiat-Naor scheme is 
implemented in this way on each i?„, and thus every user has 50 values in his or 
her personal key. 

Now, suppose the TA wants to encrypt a secret key S G F,, where the 
privileged subset is P = {0, 1, 2, • • • , 59}, so w = 60. The TA uses a (5,41)- 
threshold scheme to split S into 41 shares, yo, . . j/ 40 . For example, 

Kl = + 2 ;_Ri, 81 . 

The enabling block will be the list of encrypted values 

(eifo(yo), • • • , 6x40(2/40))- 

Any user in P can decrypt the enabling block. For example, consider user 5. 
The block P 5 = {6, 15, 23, 21, 42}. Then user 5 obtains five of the 41 secret keys, 
namely, Kq, ATis, K 23 , K 21 and K 42 , and recovers the five shares ye, j/ 15 , j/ 23 > 
j /21 and j/ 42 . From these five shares S can be obtained. 
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Any user not in P cannot decrypt the enabling block. For example, let us 
consider user 63. If j ^ Bqs, then user 63 does not have Kr. and cannot compute 
Kj. On the other hand, if j G Bq^, then user 63 does not have and 

cannot compute Kj either. Thus user 63 cannot compute any of the shares in 
the threshold scheme. 

Finally, let’s show that the scheme is 2-traceable. If a pirate decoder E is 
found, then the TA can compute Index(A) as described above. Index(A) must 
contain at least 5 numbers, otherwise it cannot decode anything. Suppose that 
the decoder was made by two users, say i and j. Since Index(A) C [Bi U Bj) it 
must be the case that |lndex(A) n i?i| > 3 or |lndex(A) D Bj \ > 3. Since any two 
blocks intersect in at most one point, |lndex(A) D Bh\ < 2 if h i,j. Thus user 
i or user j (or both) will be exposed users. 

5 Threshold tracing 

In the schemes of Section ^ the I ndex of any pirate decoder should contain 
at least k values, otherwise the decoder cannot get any information from the 
broadcast. However, as indicated in (the final version of ^), such security 
is not needed in many applications. For example, in pay-TV applications pirate 
decoders which decrypt only part of the content are probably useless. Thus ^3 
defined the concept of a threshold traceability scheme. In a threshold traceability 
scheme, the tracing algorithm only can trace the decoders which decrypt with 
probability greater than some threshold p. In this section, we discuss some key 
preassigned threshold traceability schemes, denoted by KPTTS. Our approach 
is quite different from the methods used in ^J. We will use ramp schemes to 
construct KPTTS. 

We can obtain a ramp scheme from an orthogonal array. 

Definition 51 An orthogonal array OA{t, k, s) is an s* x fc array, with entries 
from a set y of s > 2 symbols, such that in any t columns, every t x 1 row vector 
appears exactly once. 

The following lemma (| Chapter VI. 7]) provides infinite classes of orthogo- 
nal arrays, for any integer t. 

Lemma 51 If q is a prime power and t < q, then there exists an OA{t, g-f 1, q). 

Suppose there is an OA{t, v + t — r,q) which is public knowledge. The secret 
information K is a {t — r)-tuple from F^. The TA chooses secretly a row in the 
OA such that the last t — r columns of that row contains the tuple K. It is easy 
to see that there are q^ such rows. The TA then gives each of the v users one 
value from the first v columns of that row. Since any t of these values determine 
a row of the OA uniquely, t users can get K by combining their shares. However, 
from any r values, the users cannot obtain any information about K, since these 
r values together with last t — r columns of any row in the OA determine that 
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row. (For more detailed description of this construction, the reader can consult 

Our KPTTS is similar to the KPTS constructed in Section Q The only 
difference is that we use a (0, fc, n)-ramp scheme to split the message into shares 
in a KPTTS, instead of the (fc, n)-threshold scheme used in the KPTS. 

In the KPTTS, the base key set and preassigned keys are the same as in the 
KPTS. However, when the TA wants to send a secret message M G (Fg)^ to a 
privileged subset, the TA uses a (0, fc, n)-ramp scheme to split M into n shares. 
The TA uses the same method of KPTS to encrypt the n values, and broadcasts 
the resulting list of n values. Similar to the KPTS, any user in the privileged 
subset can compute k keys, so he or she can recover the n values from the ramp 
scheme, but the users not in the privileged subset cannot get any information 
from the encryption. 

Now suppose that a pirate decoder E is found. If the size of Index(A) is not 
less than k, then the TA can find an exposed user as he did in the KPTS. When 
the size of Index(A) is less than k, the TA may not be able to trace the users in 
the coalition. So let us see what a decoder E could do, if the Index(A) contains 
fc — 1 values. Note that the ramp scheme is constructed from an OA(fc, k + v,q). 
For any k — 1 values, there are q rows which contain these fc — 1 values. Among 
these q rows, only one row carries the secret message M. Hence the decoding 
threshold of the KPTTS is 



1 




The information rate of the KPTTS is the same as that of the KPTS, but 
the broadcast information rate of the KPTTS is much better. In the KPTTS, 
we have 

k k 

Pb = — > -■ 
n V 

Similar to the KPTS, the KPTTS is also based on the set systems TSS. We 
will discuss the construction of TSS in the next section. 

6 Constructions of traceability set systems 

To construct our traceability schemes, we need to find traceability set systems. 
Some constructions for these types of set systems were given in they are 
based on certain types of combinatorial designs. (A comprehensive source for 
information on combinatorial designs is Colbourn and Dinitz ^.) We present a 
useful lemma for constructing TSS, and mention some applications of it. 

Lemma 61 Suppose there exists a set system (X, A) satisfying the following 
conditions: 

1. |A| = fc > + 1 for any A € A; 

2. \Ai n AjI < p for any A^, Aj G A, i j- 

Then the set system is a (c, k)-TSS. 
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Proof. Let E C with \E\ > k. Since k > c^/i + 1, there is a block As, 

1 < s < c, such that |i?n As| > c^+ 1. For any A ^ A\{Ai, A2, ■ ■ ■ Ac}, we have 

\EnA\< \An{uUA^)\ 

< Cfl 

< + 1 

< \EnAs\. 

Hence, the set system is a (c, fc)-TSS. □ 

As a first application of Lemma we give a construction using t-designs. 

Definition 61 A t-{v,k,X) design is a set system {X,A), where |A| = v and 
|A| = k for all A £ A, sueh that every t-subset of X appears in exactly A blocks 
of A. 



Theorem 62 Suppose there exists a t-{v, k, 1) design. Then there exists a (c, k)- 
TSS, where c = [\/(fc — l)/(t — 1)J • 

Proof. Any two blocks of a t-{v, k, 1) design intersect in at most t — 1 points. 
Apply Lemma^Jwith n = t — 1. □ 

There are many results on t-{v,k,l) designs for small values of t, i.e., for 
2 <t < 6. See Q for a summary of known results. We can construct interesting 
TSS using designs with t = 2. For example, it is known that there is a 2-{v, 5, 1) 
design for all u > 5, u = 1, 5 mod 20. These designs give rise to an infinite family 
of (2, 5)-TSS. Applying Theorem^Jwe have the following KPTS. 

Theorem 63 There exists a 2-KPTS for all v > 5, v = 1,5 mod 20, for a set 
of b = v{v — l)/20 users, having p = ^(^.^- 1 ) Pb = \- 

Note that Example^Jis the case u = 41 of the above theorem. 

Similarly, we have 

Theorem 64 There exists a 2-KPTTS for all v >5, v = 1,5 mod 20, for a set 
of b = v{v — l)/20 users, having p = ^(^.^- 1 ) Pb = 

A 3-{q^ + 1,<7 + 1,1) design, known as an inversive plane, exists for any 
prime power q. The following result concerns the KPTS and KPTTS that can 
be constructed from inversive planes. 

Theorem 65 For any prime power q, there exist a c-KPTS and a c-KPTTS, 
where c = }\/^\ , with information rate p ~ and broadcast information rates 
PB~^ for KPTS and ps « | for KPTTS. 
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In it is proved that there exists a threshould traceability scheme with 
broadcast information rate pB — O(^). However, the proof of that is not ex- 
plicit. Our construction is explicit and the threshold of our scheme is usually 
better than that of the scheme in Also our scheme is key preassigned. 

Many other constructions of TSS can be given using combinatorial objects 
such as packing designs, orthogonal arrays, universal hash families, etc. The 
constructions are similar to those found in 



7 Some remarks 

We make a couple of final observations in this section. 

— The (c, /)-KPTS scheme discussed in this paper is a generalization of the 

traceability schemes in ^^3- The schemes in are in fact the case of 

/ = 0 of our main construction. When / = 0, there is no protection against 
an unauthorized user decrypting the enabling block. 

— Most broadcast schemes and traceability schemes in the literature are de- 
scribed as unconditionally secure schemes. If the encryption function e(-) 
used in the scheme in this paper is addition in a finite field F^, then our 
scheme is also unconditionally secure. However, the drawback of using the 
above unconditionally secure encryption scheme is that the resulting KPTS 
and KPTTS will be a one-time scheme. On the other hand, if we desire only 
computational security, then we can replace e(-) by any cryptosystem that is 
computationally secure against a known plaintext attack, and we will obtain 
a KPTS that can be used for many broadcasts. This simple modification 
can be applied to other one-time schemes described in previously published 
papers. 
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Abstract. We introduce a new payment architecture that limits the 
power of an attacker while providing the honest user with privacy. Our 
proposed method defends against all known attacks on the bank, im- 
plements revocable privacy, and results in an efficient scheme which is 
well-suited for smartcard- based payment schemes over the Internet. 



1 Introduction 

Since the conception of anonymous payment schemes by Chaum in the early 
eighties, a lot of attention has been given to new schemes for transfer of funds. 
Anonymity and its drawbacks, in particular, has been a busy area of research 
lately, giving rise to many solutions for balancing the need for privacy against the 
protection against abuse. However, all of these schemes have been based on the 
same basic architecture as the pioneering scheme, which we suggest may be an 
unnecessary limitation. By rethinking the underlying privacy and fund-transfer 
architecture, we show that a stronger attack model can be adopted, and costs 
curbed. 

It is well known that it never pays to make any link of a chain stronger than 
the weakest link. What has not been given much - if any - attention is that, in 
fact, it can be damaging to do so. The reason is that an attacker may selectively 
avoid using the weakest link components, thereby potentially enjoying protocol 
properties corresponding to the strongest link of the chain. More specifically, 
an attacker is implicitly given the option to use alternative tools in lieu of the 
given privacy-limiting portions of the scheme. For example, as appears to be the 
case for most Internet based payment schemes, the connection is the weakest 
link in terms of privacy, as a user reveals his IP address to some entity when 
establishing a connection. An attacker could physically mail in disks and ask to 
have the encrypted responses posted to newsgroups - and therefore successfully 
hide his IP address. An honest user would not be granted this same level of 
privacy: his address would only be concealed by means of an anonymizer, such 
as a mix-network (e.g., or a crowd both of which are based 

on the cooperation of some trusted participants. Therefore, if one link of the 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 157-^^| 1999. 
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chain, corresponding to the privacy of one system component, is stronger than 
others, this potentially benefits an attacker without directly adding any value to 
the honest users. 

We argue that it is unnecessary to employ a scheme with properties that can 
only be used by attackers. Moreover, by degrading the privacy of components 
until all components deliver the degree of privacy of the weakest link, efficiency 
improvements can be found. Thus, we can strengthen the defense against attacks 
(some of which might not be known to date) at the same time as we make the 
system less costly, without making the system less attractive to its honest users. 

An example from the literature of this type of “double improvement” is the 
work of Jakobsson and Yung in which the hank robbery attack (in 

which the attacker obtains the secret keys of the banks) was introduced and 
prevented against, and the mechanisms employed to achieve this goal were used 
to improve the versatility and efficiency of the resulting scheme. Their scheme 
has two modes - one efficient off-line mode, for the common case, and one rather 
costly on-line mode, to be used only after a successful bank robbery. By making 
sure that an attacker can never enjoy a higher degree of privacy than an honest 
user, and implementing mechanisms for a quorum of servers to selectively revoke 
privacy, we obtain the same degree of security as the on-line mode of the scheme 
by Jakobsson and Yung, at a cost that is comparable to the off-line mode of the 
same scheme. 

Our scheme is, unlike other payment schemes with privacy, not based on 
blind signatures (or variations thereof), but its privacy is derived solely from the 
use of a mix-network. It does not allow an attacker any higher degree of privacy 
than honest users, but all users enjoy the following degree of privacy: As long 
as no quorum of bank servers is corrupted or decides to perform tracings, and 
no payer or payee reveals how they paid or were paid, the bank servers can only 
learn the number of payments to and from each account during each time period 
(e.g., one month.) Moreover, as is appropriate, the transactions are independent 
in the sense that knowledge about one transaction gives no knowledge of another 
transaction (other than reducing the number of remaining possibilities.) 

We note that if one bank server and one merchant collude, then they can 
t^ether remove the privacy for transactions that this merchant was involved 
ii| Whereas this is a lower degree of protection than what many other schemes 
offer, it makes sense in a setting where users are not concerned with the bank 
potentially learning about a few of their transactions, as long as the majority 
of the transactions are secret. We believe that this is an adequate degree of 
protection to avoid profiling of users performing every-day transactions, which 
is a degree of privacy that society seems to agree on should be granted. 

The communication and computational costs for the different transactions are 
very similar to those of other schemes, but the amount of data stored by payers 
and merchants is significantly lower, making our suggested scheme particularly 
suitable for smartcard implementations. 

^ This limitation can be avoided at the cost of a lower propagation rate, as will be 
discussed onwards. 
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2 Using Smartcards for Payment Schemes 

Let us for a moment take a step back from what features are desirable in a 
payment scheme, and focus on what is absolutely necessary. In order not to have 
to rely on any physical assumption, we need some form of authentication and 
encryption for security and privacy. Also, in order for the user not to have to put 
excessive trust in other entities, public key solutions have to be used. Therefore, 
let us call signature generation and encryption the minimal requirements (where 
the user performs all or some of the computation involved in performing the 
transactions.) 

The most suitable and least expensive way of meeting these requirements 
in an environment with portable payment devices would be to develop and use 
a very simple smartcard with a hardware accelerator for modular multiplica- 
tions, and with a minimum of memory (since what is expensive in the design 
of smartcards is EEPROM and RAM). Such a product would be perfect for 
generating and verifying signatures (e.g. DSA Q or RSA [3 signatures), and 
the use of a co-processor would allow very fast public- key certificate verification. 
It would offer the necessary operations required by all e-commerce applications. 
However, the use of such a smart card desperately clashes with existing solutions 
for anonymous e-commerce, since these require high amounts of storage, com- 
plicated protocols, and in many cases rely to some extent on tamper-resistance. 
(Note, however, that there are numerous payment schemes that are smart-card 
based, e.g., This is, in fact, a case in point, since these schemes either 

require expensive special-purpose smart cards, or limit the safe usage of the 
payment scheme due to its reduced security.) 

The main advantage of our scheme is that we overcome the major drawbacks 
related to the original e-coin paradigm: using a mix-decryption scheme to provide 
controlled anonymity, we build a counter-based scheme where the participants 
do not need to store anything but their secret keys and a small amount of user- 
related information. In our setting, all participants could therefore be represented 
as owners of an inexpensive and simple smart-card of the type discussed above. 
We demonstrate how to build a payment scheme offering privacy, using only 
features of such a minimalistic cryptographic smartcard. Our solution offers: 

— flexibility and plug-and-play ability to users: the same device enables home- 
banking, e-commerce and potentially extensions to applications such as pri- 
vacy-enhanced email, 

— strong protection to users, issuers and governments by the use of controlled 
and balanced privacy. 

3 Intuitive Approach 

Instead of using the common token approach, we will use an account-based 
approach, where a payer requests to have a payment made from his account 
to another account. The payer’s identity will be known to the bank, who will 
deduct the appropriate amount from his account after a valid, authenticated 
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payment request has been received. However, the merchant’s identity will not be 
obvious from this request. More specifically, it will be encrypted using a public 
key, whose corresponding secret key is distributively held by a set of servers, 
which are controlled by banks and government entities. At given time intervals, 
or after a certain number of payment orders have been received, these will be 
collectively decrypted by a set of servers holding secret key shares. This is done 
in a manner that does not reveal the relationship between any given encrypted 
payment order and its corresponding cleartext payment order, the latter which 
contains the account number of the merchant to be paid. After the collective 
decryption of all encrypted payment orders, all the accounts indicated by these 
will be credited appropriately. 

In order to increase the propagation speed between initiation of payment (the 
submission of the encrypted payment order) and the acceptance of a payment 
by the merchant, we can let the payer prove to the merchant that the payment 
order has been accepted by the bank, and that the eventual decryption of the 
payment order will result in the crediting of the merchant’s account being made. 

We note that in either case, there is no need for the merchant to be in 
direct contact with the bank (there is no deposit), and neither the payer nor 
the merchant needs to store a large amount of information, but just a sufficient 
amount to generate vs. verify payment orders. The main computation will be 
done by the much more powerful bank servers, and without any strong online or 
efficiency requirements being imposed. 



4 Related Work 

When the concept of electronic payments was introduced (see BQ) there was 
a strong focus on perfect privacy. Lately, as possible government policies have 
started to be considered (e.g., B3)’ attacks exploiting user privacy have 
been discovered (e.g., ^]^|), the attention has shifted towards schemes with 
revocable privacy. 

In B, Brickell, Gemmell and Kravitz introduced the notion of trustee-based 
tracing, and demonstrated a payment scheme that implemented computational 
privacy that could be revoked by cooperation between the bank and a trustee. 
The concept of fair blind signatures was independently introduced by Stabler, 
Piveteau and Camenisch this is a blinded signature for which the blinding 
can be removed by the cooperation between the signer (the bank) and a trustee. 
This type of signatures were employed in payment schemes in BB, and a smart- 
card based variation was suggested in BB- In methods were introduced 

allowing the trustee not to have to be on-line during the signature generation, 
by employing so called indireet discourse proofs, in which the withdrawer proves 
to the bank that the trustee will later be able to trace. 

In the above schemes, a cooperating bank and trustee were able to trace all 
properly withdrawn coins, but not coins obtained in other ways. The underlying 
attack model was strengthened by Jakobsson and Yung by the introduction 
of the bank robbery attack, in which an attacker compromises the secret keys 
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used for signing and tracing, or forces the signers to produce signatures using an 
alternate generation protocol; their proposed solution protects against this attack 
and permits all coins to be traced, and any coin to be successfully blacklisted 
after a short propagation delay. Also, the degree of trust needed was decreased, 
by the use of methods to assure that the tracing and signing parties do not cheat 
each other. This work was improved on in by the distribution of the 

parties and the introduction of a minimalistic method for privacy revocation 
(i.e., verifying if a given payment corresponds to a given withdrawal.) 

We achieve the same protection against attacks on the bank as the above 
constructions, and additionally, by shifting the model from a coin-based to an 
account-based model, make bank robberies futile: If the secret keys of the banks 
should be compromised, this only allows the attacker to trace payments, and 
not to mint money. Additionally, we meet the level of functionality introduced 
in in terms of methods for tracing, and distribution of trust and func- 

tionality. Finally, by the shift in architecture, we remove the propagation delay 
for blacklisting - in fact, we avoid having to send out blacklists to merchants 
altogether. 

However, our scheme requires the payer to be able to communicate with 
a bank or clearing center for each payment. Also, it does not offer the same 
granularity of payments that the use of challenge semantics in could, but 

potentially requires multiple payments to match a particular amount (much like 
for many coin based schemes). 

On the other hand, we are able to reduce the amount of information stored 
by the payer (who only has to store his secret key.) Moreover, the merchants 
neither have to store any payment-related information (after having verified its 
correctness), nor do they ever have to connect to the bank. (In some sense, 
one might say that the deposit is performed by the payer.) This allows for a 
very efficient implementation in a smartcard based Internet setting (where the 
smartcards are used by the payers to guarantee that only authorized users can 
obtain access to funds.) 



The anonymity in our scheme is based solely on the use of a mix-network, a 
primitive introduced by Chaum We demonstrate an implementation based 
on a general mix-network (e.g., QQ), and state and prove protocol properties. 
We then consider the use of a particular, recently proposed type of mix-network 
, and discuss how its added features can further strengthen the payment 
scheme. The result is an efficient and practical payment scheme, well suited for 
Internet implementation, that prevents by all known attacks by ensuring that an 
attacker never can obtain any more privacy than an honest user is offered. The 
privacy is controlled by a conglomerate of banks and ombudsmen, a quorum of 
which have to cooperate in order to revoke it. 

Another scheme that uses a mix-network (or some other primitive imple- 
menting anonymous communication) is the payment scheme of Simon [J. That 
scheme is similar to ours also in that it takes the “minimalist approach” as well, 
building an anonymous payment scheme from a small set of simple components, 
and in that the account information to some extent is kept by the bank. More 
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specifically, Simon’s scheme uses a mix-network to communicate preimages to 
function values that are registered as having a value. When the merchant receives 
such a payment, he deposits it by sending (again, using the mix-network) the 
preimage along with a new function value, to which only the merchant knows the 
preimage. The bank cancels the old function value and associates value with the 
new function value. Whereas this allows linking of payments through a chain, it 
still offers a high degree of privacy to its users. Although Simon’s scheme is not 
directly concerned with limiting the privacy an attacker can enjoy, and there are 
several privacy-related attacks that can be performed (since there is no direct 
notion of revocation of privacy), it still appears to be the case that the scheme is 
secure against some of the strongest attacks (e.g., bank robbery.) This is due to 
the fact that the representation of value is effectively controlled by the bank by 
the storage of data by the bank, and not solely linked to having a bank signature. 

An interesting observation is the close relationship between our proposed 
payment scheme and election schemes. More specifically, it can be seen that our 
payment scheme can be used as m election scheme with only minor modifica- 
tions; the resulting election schem^implements most recently proposed features, 
and allows for multi-bit votes to be cast. 

5 Model 

Participants: There are five types of participants ^11 modeled by polynomial- 
time Turing machines: payers, merchants, bank^ a transaction center, and 
the certification authority. The payers and merchants have accounts with 
banks (but not necessarily the same banks); the transaction center processes 
transfers between accounts of payers and merchants, and is controlled by 
a conglomerate of banks; the certification authority issues certificates on 
all other participants’ public keys, and may be controlled by banks and 
government organizations. The transaction center knows for each payer’s 
public key the identity of the payer’s bank. (If a user has several banks, he 
correspondingly has several public keys, one per bank.) 

Trust: The payers and merchants trust that the banks will not steal their 
money. The payers trust for the privacy of a particular transaction that the 
merchant of the transaction does not collude with bank servers or otherwise 
attempt to reveal their identities. The payers trust for their general privacy 
that there is not a quorum of dishonest, cooperating banks constituting a 
corrupt transaction center. 

Computation: We base our general scheme on the existence of a mix-network. 
Such a scheme can be produced based on any one-way function. We also 

^ Instead of encrypting the acconnt number of the payee, the payer /voter would en- 
crypt his vote. 

^ It is possible to snbstitute some bank servers for ombudsman servers, whose aim it 
is to limit illegal tracing transactions by bank servers. These will not have to be 
trusted with funds, and only hold tracing keys. Due to a shortage of space, we do 
not elaborate on how exactly to implement these. 



Mix-Based Electronic Payments 



163 



present a scheme using a recently proposed type of ElGamal based mix- 
network 



6 Requirements 

We will require the following from our scheme: 

Unforgeability: It is not possible for a coalition of participants not including 
a quorum of bank servers to perform valid payments for a value v exceeding 
the value charged to these participants. 

Impersonation safety: It is not possible for a coalition of cheating partic- 
ipants to perform a transaction resulting in an honest participant being 
charged more than what he spent. 

Overspending blocking: It is not possible for a coalition of participants to 
perform payments that are accepted by merchants as valid, for an amount 
exceeding their combined limits, or exceeding what they will be charged for. 

Payment blocking: It is always possible for a user to block all payments from 
his account, going into effect immediately. 

Revocability: The bank of any user can restrict the rights of the user to per- 
form payments. If a user has no bank agreeing to a payment, the payment 
will not be performed. 

Framing- freeness: It is not possible for a coalition of participants not including 
an honest user to produce a transcript corresponding to a payment order 
from the honest user. 

Uniform anonymity: The probability for any coalition of participants, not 
including a quorum of bank (and ombudsman) servers, to determine from 
whom a payment was made and to whom (assuming that neither of these 
parties collaborate with the same servers) is non-negligible better than a 
guess, uniformly at random from all possible pairs, given all pairs of payers 
and merchants corresponding to the input and output of the mix-network. 
It is not possible for a coalition of participants to obtain a higher degree of 
anonymity by forcing alternative protocols to be used. 

Traceability: Any quorum of bank server^ are able to perform the following 
actions: 

1. Given identifying information about a payer (and possibly a particular 
payment transaction of the payer), establish the identity of the receiver 
of the corresponding payment (s). 

2. Given identifying information about a merchant (and possibly a particu- 
lar payment transaction to the merchant), establish the identities of the 
corresponding payer(s). 

3. Given identifying information about a payer; a particular payment trans- 
action from the same; a merchant; and a particular payment transaction 
to the same, establish whether these correspond to each other. 

As noted before, some of these may be controlled by an ombudsman. 



164 



Markus Jakobsson and David M’Rai'hi 



7 Quick review: Mix-networks 

Before presenting our scheme, let us briefly review what a mix-network is: In 
general terms, it is a set of servers that serially decrypt and permute lists of 
incoming encrypted messages. Here, the messages are either encrypted using all 
the individual public keys of the servers, or using one public key, where the 
corresponding secret key is shared by the mix servers. 

The scheme implements privacy as long as at least one of the active mix 
servers does not reveal what random permutation he applied, and the encryption 
scheme is probabilistic, so that it is not possible to compute the same encrypted 
messages that constituted the input given the decrypted messages that constitute 
the output. 

Some of the schemes introduced are robust, meaning that it is not possible for 
some subset of cheating servers participating in the mix-decryption to make the 
final output incorrect (without this being detected by the honest participating 
servers.) 

We present a more detailed explanation of one particularly useful class of 
mix-networks in section H 



8 Paying and Tracing (General Version) 

8.1 Setup 

For each time interval between accounting sessions (in which the payment or- 
ders get decrypted and the merchants credited) a new public key may used 
(we will elaborate on when this is necessary onwards.) These public keys can 
be broadcast beforehand. The corresponding secret keys are known only to the 
transaction center, and are kept in a distributed manner so that any quorum of 
mix-servers can calculate it. 

Each party is associated with a public key, which is registered with the transac- 
tion center if the party in question is authorized to perform payments. Only this 
party knows the corresponding secret key, which is used to sign payment orders 
and possibly other data which needs to be authenticated. The signature scheme 
employed for this is assumed to be existentially unforgeable. 



8.2 Paying 

There are three phases of the payment scheme: negotiation, initiation of payment, 
and completion of payment. In the first, the payer and the merchant produce a 
description of the transaction; in the second, the transaction is being committed 
to and the payer’s account debited; and in the third, the merchant’s account is 
credited accordingly. Note that the transaction becomes binding in the second 
phase already, so the transfer of the merchandise bought can be initiated right 
after the second phase has finished, and the third phase may be performed at a 
later point. This is a scenario rather similar to the credit card scenario, in which 
the payer commits to the transfer much before the merchant receives the funds. 
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1. Negotiation 

In the negotiation phase, the payer and the merchant agree on an exchange, 
i.e., a description of the merchandise, how it will be delivered, the price of the 
merchandise, etc. We let m be a hashed-down description of this contract. 
Furthermore, the merchant gives the payer his account number a (which 
specifies the bank as well as identifies the merchants account in this bank) 
and a serial number s for this transaction. If the payment will constitute 
of several coins, a suite Si, . . . , Sfc of such serial numbers is given, each one 
representing one partial payment. A payment order o is of the form m|a|s. 
(If a suite of serial numbers is used, then a corresponding suite of payments 
orders Oi, . . . , Ofc will be generated.) 

2. Initiation of Payment 

(a) The payer encrypts a batch of payment orders Oi , . . . , o„ using the public 
key encryption scheme of the transaction center (which corresponds to 
the mix-network). These payment orders do not have to correspond to 
one transaction only, or have the same denomination, but may stem from 
multiple simultaneous transactions and be of different values. The result 
is a batch of encrypted payment orders, 6i, . . . , o„. If different denomina- 
tions are supported in the system, a description d is appended, describing 
what the values of the different transactions are. If only one denomina- 
tion is supported, d is the empty string. The payer signs Oi , . . .,On,d 
using his private key. The resulting signature is a. The payer sends 
6i, . . . , On, d, a to the transaction center. 

(b) The transaction center verifies the signature a. One of two approaches is 
taken in order to avoid replay attacks: either, the payer needs to prove 
knowledge of some part of the posted message in a way that depends 
on time, or the transaction center keeps the posted messages in a sorted 
list and ignores any repeated posting. If the latter approach is taken, the 
payment orders are encrypted using a new public key for each payment 
period (the time in between two accounting phases.) 

The transaction center then may verify that the payer has sufficient 
funds. This may be required of certain banks, or for amounts above 
some threshold; alternatively, verifications may be performed randomly 
or in batches to lower the processing times and costs. 

Each valid encrypted payment order di is added to an internal list of 
payments to be performed; there is one such list for each denomination. 
The payers’ accounts are debited accordingly, either immediately or in 
batches. The transaction center signs eacl| individual valid encrypted 
payment order di, resulting in a signature <7^. The transaction center 
uses different public keys for different denominations, or appends a de- 
scription of the denomination before signing. It sends (Ti, . . . , cr„ to the 
payer. 



® Suites of transactions orders corresponding to the same transaction may be signed 
together to improve efficiency. 
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(c) The payer sends a suite of signed, encrypted payment orders (of the 
format di,ai) to the corresponding merchant. He also sends a proof of 
what the encrypted messages are (in some cases, this amounts plainly to 
show the plaintext.) 

(d) The merchant verifies that ai is the transaction center’s signature on the 
payment order di, and that the decrypted value Oi is of the form m|a|s for 
valid a valid merchandise description m, the merchant’s account number 
a, and a sequence number s not yet received. 

If the above verification goes through, then the merchant stores the se- 
quence number, and delivers the merchandise purchased. 

3. Completion of Payment 

At given intervals, the transaction center decrypts all the payment orders 
using the mix-decryption scheme employed. The result is a plaintext list of 
payment orders for each denomination. The items of these lists indicate what 
accounts are to be credited, and by how much. The banks corresponding to 
the accounts indicated credit these accounts accordingly. 

Remark 1: If steps 2c and 2d are excluded, we can avoid the privacy assumption 
that the merchant is not colluding with a bank server. The cost for this is a lower 
propagation speed, i.e., the purchase will not be possible to complete until the 
bank servers has decrypted the payment order batch. 

Remark 2: Note that the payment orders can be generated by the payers with- 
out any real interaction between the payer and the merchant. This is possible 
if m is a publicly available description (e.g., the hash of an advertisement sent 
out by the merchant,) a is publicly available as well, and Si, . . ., Sfc is selected 
uniformly at random from a space that is large enough to avoid collisions. 

Remark 3: In order to limit the amount of storage needed for the merchant, 
the sequence numbers may contain a date and time stamp, and will only be 
accepted - and stored - within an interval associated with this timestamp. 

8.3 Tracing 

If a general mix-network is employed, only the two basic tracing operations can 
be performed efficiently, tracing from a payer to a payment order, and from a 
payment order to a payer. We will later look at how this can be extended (and 
simplified) for an ElGamal based mix-network. 

1. Payer ^ Payment Order 

The trace is performed simply by decrypting the encrypted payment order 
6 in question, arriving at the plaintext payment order o, which specifies the 
account to which the payment is performed. 

2. Payment Order ^ Payer 

The trace is performed as follows: the mix-servers reverse the computation 
of o from o step by step, proving to each other that each step is valid. De- 
pending on what type of mix-network is employed, it may be necessary to 
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keep intermediary values, such as partial decryptions and the permutations 
employed; if these are not available, then they can be re-generated by de- 
crypting the entire bulletin board again. 



9 Paying and Tracing (ElGamal Version) 

If the recently proposed type of mix-decryption scheme (different methods in- 
dependently proposed by Abe Q], by Jakobsson and by Ogata, Kurosawa, 
Sako, and Takatani is used by the transaction center, this allows several im- 
provements to be made. Let us first briefly describe this mix-decryption scheme, 
and then elaborate on the advantageous consequences of using it. 

1 . The input to the scheme is a list of ElGamal encrypted messages (encrypted 
with the public key of the transaction center, i.e., mix center.) The output 
is a permuted list of the corresponding decrypted values. 

2. The decryption process can be performed by any k out of n servers, who share 
the secret key for decryption. The participants with whom the encryptions 
originated need not be involved. 

3. No subset of less than k out of n mix servers can perform the decryption. 

4. The decryption process is robust, meaning that if any server(s) should cheat, 
then the cheating will be detected, and their identities become known to the 
remaining servers. 

5. No subset of cheating servers can correlate items of the input to items of the 
output (unless they already know the corresponding plaintext messages.) 

6. The decryption process is efficient. 

If this scheme is employed, the following holds: 

1. As long as there exists a quorum of honest mix-servers (controlled by the 
banks), the decryption process will be possiblj 

2. As long as at least one of the participating mix-servers is honest, the cor- 
rectness of the output is guaranteecj 

3. As long as there is no dishonest quorum of mix-servers, the privacy of users 
will be guaranteed. 

The tracing scheme can be simplified as well, and makes it unnecessary for the 
banks to store the contents of the bulletin board, or intermediary computation, 
in order to perform traces. The following three types of tracing can be performed: 

1. Payer — > Payment Order 

The trace is performed simply by decrypting the encrypted payment order 
o in question, arriving at the plaintext payment order o, which specifies the 
account to which the payment is performed. 

® This is an important issue, since otherwise it would be possible for one corrupt bank 
or ombudsman (controlling one mix-server) to stop all payments in one payment 
period (unless the payers volunteer to repeat their payments!) 

^ This is important, or a subset of mix-servers could manipulate the payments without 
being detected. 
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2. Payment Order ^ Payer 

Here, the given decrypted payment order is to be compared to the encrypted 
input items. When a match is found, the corresponding payer identifier is 
output. This can be done by first blinding both the payment order and 
all the input item:| using the same distributively held blinding factor, and 
then decrypt all the blinded input items, after which a match is found. If 
the mix-decryption scheme by Jakobsson is used, then the trace can 
be performed by computing a tag, corresponding to the input to the mix- 
network, and comparing this tag to the partially decrypted (but still blinded) 
encrypted payment orders. We refer to Q for a more detailed description. 

3. (Payment Order, Payer) — ^ (yes/no) 

This trace is performed by computing the above tag from the encrypted 
payment order in question, and verify whether this tag corresponds to the 
decrypted payment order (without revealing any other information than this 
one bit.) This can be done using the verification protocol for undeniable 
signatures or using a small modification of the first method described 
above. 



10 Claims 

The basic scheme satisfies the following requirements (for a quorum size of all 
participating bank servers, who are assumed to follow the protocol, although 
they may be curious to learn additional information): unforgeahility^ imper- 
sonation safety, overspending blocking, payment blocking / revocability, uniform 
anonymity, and framing-freeness. Additionally, the two first modes of tracing 
detailed in traceability are possible. 

The ElGamal based scheme satisfies (without any condition on quorum size 
or behavior of non-cooperating servers) the above requirements, and full trace- 
ability. We prove these claims in the Appendix. 

11 Performance Analysis 

This section demonstrates that our proposed scheme is sufficiently efficient to 
be practical in a mobile setting, in which the user’s device is a portable token, 
such as a smart-card with restricted capacities in terms of computational power 
and memory storage. 

Although our solution is also competitive in a PC-based setting, it is the mo- 
bile setting that is the most restrictive. We note also that whereas a good per- 
formance w.r.t. the mix-decryption is important, it is not vital for the efficiency 
of the scheme, since the mix-decryption is performed off-line and in batches, and 
does not involve the payer or merchant, nor requires these to wait for the result of 

® An ElGamal encryption (a, b) can be blinded using a blinding factor S by computing 
{a^ , b^). This results in a blinding of the plaintext message m to m'*, but in encrypted 
form. 
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the operation. In fact, the employment of the mix-network will be transparent to 
the users in the case where a threshold-based scheme (such as ElGamal) is used. 
(For a treatment of the efficiency of the mix-decryption schemes in question, we 
refer to the corresponding papers where these are treated in detail.) 

Table^summarizes computational performance of two smartcards with cryp- 
tographic accelerators: Smartcard the new smartcard produc t from Siemens ^3, 
used for instance in the Gemplus GPK range, and the Hitachi^^J enhanced chip 
for smartcards. Performance of the Siemens SLE66X160S has been carefully eval- 
uated, running routines on emulator at a 3.68 MHz clock frequency and with 
the worst-case set of parameters (all the exponent bits set). 



Data Length I (bits) 

H8/3113 

SLE66CX160S 



512 768 1024 

90 ms 350 ms 700 ms 
186 ms 593 ms 1045 ms 



Table 1. - Exponentiation Timings for exponent=l/ 



Let us now consider a full implementation of the scheme. The smartcard used 
is based on the Siemens chip (GPK card) giving us timings around 300 ms for the 
1024-bit RSA signature (with GRT) and 200 ms for the El-Gamal encryption. 
The total time expected for processing a payment order is approximately the 
time required for encrypting and signing data: The communication time for pro- 
cessing such transactions is around 100 ms when transmitted according to 7816-4 
standard at 115,200 bd. Since the user does not need to update a transaction 
log, writing to EEPROM (which is quite time consuming) is avoided. 

We performed various practical tests confirming that the total time for issuing 
a payment order should be around 1 second, as detailed in table We assume 
that: 

1 . the time spent for checking at merchant and center is negligible (we consider 
a protocol overhead of 100 ms) 

2. communication is performed at 115,200 bds 

3. 512-bit modulus for ElGamal and 1024-bit modulus for RSA are practical 
and secure parameter sizes 
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Operation 


Processing Time 


512-bit El-Gamal Encryption 


227 ms 


Hashing Ciphertext (SHA) 


32 ms 


1024-bit RSA Signature 


289 ms 


Sending to center 


110 ms 


Sending to merchant 


110 ms 


Checking Overhead 


100 ms 


Total 


868 ms 



Table 2. Transaction Time Evaluation 
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Appendix: Proofs of Claims 

We outline the proofs of the claims stated in section^] where applicable, the 
proofs have one part relating to the general scheme (assuming that all mix servers 
of the mix-network used cooperate) and one to the ElGamal scheme (assuming a 
quorum of specified size to cooperate). The final version will include full proofs, 
which are kept short here due to space shortage. 

Theorem 1: The schemes implement unforgeability. 

Proof of Theorem 1: (Sketch) 

First, in order to post a payment order that will be accepted, a valid signature of 
the payer has to be produced; by the soundness of the underlying existentially 
unforgeable signature scheme, this can only be done by the account owner in 
question. Second, in order for a party to be credited, it is necessary that one 
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of the the output items of the mix-network specifies their account number. If a 
general mix-network is used, the mix-servers are assumed to perform the valid 
decryption (if they do not, they can only alter who gets credited, and not the 
number of payments performed); if the suggested ElGamal mix-network is used, 
then by the robustness of this, the outputs are guaranteed to be valid if at least 
one participating mix server is honest. Therefore, only the participating payers 
will be charged, and only the intended payees debited. The amount of the credits 
cannot exceed the amount of debits. □ 

Theorem 2: The schemes implement impersonation safety. 

Proof of Theorem 2: 

By the soundness requirements of the identification schemes used to obtain access 
to accounts, only authorized parties are going to get access to their accounts, as 
long as they keep their secret keys secret. Also, it is not possible for an adversary 
to produce a signature by a payer not cooperating with him. This follows from 
the assumption that the signature scheme used by the payer to sign an encrypted 
payment order is existentially unforgeable: a new signature cannot be produced 
without knowledge of the secret key. If the same signed message is reposted in the 
same payment period, then it will be removed, either after the payer has failed 
to prove knowledge of some part of the post, or since duplicates are ignored 
(depending on the approach taken.) Since we, for the approach where duplicates 
are removed, require the payment orders to be encrypted using a public key 
specific to the time period between two accounting sessions (corresponding to 
the mix-decryptions), and the relationship between the secret keys of intervals 
is unknown, it is not possible to force an old encryption to be accepted in a new 
time interval. 

Theorem 3: The schemes implement overspending blocking. 

This follows trivially from Theorem 1, and the fact that for each payment one 
signature has to be generated, and for each such signature, one account is billed. 

Theorem 4: The schemes implement payment blocking / revocability. 

Proof of Theorem 4: (Sketch) 

Since a payment can only be made from an account by producing a signature, 
and sending this to the transaction center, and the signatures used for this 
will identify the account owners, it is possible for an account owner to stop all 
payments from his account by requesting that no signature of his is accepted by 
the transaction center. This corresponds to either putting his public key on a 
blacklist, or by removing it from the list of valid public keys. Similarly, the same 
can be done by the bank of an account holder to block the account holder access 
to his funds. □ 

Theorem 5: The schemes implement framing- freeness. 

Proof of Theorem 5: (Sketch) 

This follows from Theorem 2, and the fact that nobody but a user (including the 
bank servers) knows the secret key of this user. Therefore, it is not possible to 
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produce a set of transcripts indicating that a party performed a given payment, 
without the cooperation from this party. □ 

Theorem 6: The schemes implement uniform anonymity. 

Proof of Theorem 6: (Sketch) 

The only way a payment can be initiated is by posting an encrypted and signed 
payment order on the bulletin board of the transaction center. The signature 
identifies the payer; if the identity of the signer cannot be established (or he 
is not authorized to make payments,) then the posted message will be ignored. 
Then, the only way that merchants can be credited is by decrypting all the 
payment orders. The link between encrypted an decrypted items can only be 
established by a quorum of mix-servers. If fewer than this could correlate the 
input and output items, this contradicts the assumption that the ElGamal en- 
cryption scheme is probabilistic. By the traceability option (see next theorem), 
the anonymity of any valid encrypted or decrypted payment order can be re- 
moved, and therefore, all participants enjoy the same degree of anonymity. □ 

Theorem 7: The general scheme implements the two first methods for trace- 
ability; the ElGamal based scheme implements full traceability. 

Proof of Theorem 7: (Sketch) 

We have established above that for each account that is credited, the transaction 
center has a signature of the party whose account will be debited: the link 
between the two can always be established by either decrypting a single posted 
encrypted and signed payment order (tracing from payer to payment order), or 
by (potentially partially) re-encrypting a decrypted payment order (arriving at 
the payer information from the payment order.) In addition, the third tracing 
option (comparison) can be performed in the ElGamal based scheme, by the use 
of the verification protocol for undeniable signatures. □ 
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Abstract. Mobile users should be able to buy their handsets and then 
get service from any service provider without physically taking the hand- 
set to the provider’s location or manually entering long keys and pa- 
rameters into the handset. This capability to activate and provision the 
handset remotely is part of the current North American wireless stan- 
dards and is referred to as ‘over the air service provisioning’ (OTASP). 
We examine current proposals and point out some of their limitations. 
Often the knowledge shared between the mobile user and the network is 
not fully specified and hence not exploited. We depart from this norm by 
first providing a classification of various sharing of secrets and secondly 
we make explicit the assumed shared knowledge and use it to construct 
various schemes for OTASP. We present a different OTASP scheme for 
each of the following assumptions: 1) availability of a land line, 2) public 
key of a CA in the handset, 3) weak secret shared by the mobile user and 
the network, and 4) secret of the mobile user which can only be verified 
by the network. 



1 Introduction 

In the future, more users will continue to migrate to mobile phones and the 
existing users of analog phones will also migrate to authenticated digital mobile 
phones. In order to have the mobile handset activated, first, authorizing infor- 
mation from the user is needed by the service provider, and then parameters like 
long lived keys, telephone numbers, and other information need to be securely 
distributed to the handset. This entire process will be referred to as service pro- 
visioning. It is important that the mobile users have their handsets provisioned 
in the most convenient way possible. Ideally, the best methods for provisioning 
of handsets should allow for: 

1. Interchangeability of handsets: Users should be able to buy handsets from 
any vendor as done currently with wired phones. 

2. Interchangeability of service providers: Users should be able to get service 
from any service providers in their region. 

3. Spontaneity: Users should be able to activate service when they want without 
having to wait to receive special codes, PINs, or even provisioned handsets. 

4. Remote provisioning: Users should be able to provision their handsets re- 
motely, over the air, without having to take the phone to a special service 
location. 
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5. Convenience: The service provisioning process should not require the users 
to enter long parameters manually into the phones. 

This does not mean that other arrangements are not possible, for example 
where the handset is bought with service already activated. And furthermore, 
service will only be provided by the service providers if authorizing information 
(e.g. credit card information) is given by the mobile user and verified. 

1.1 Example OTASP Scenario 

A mobile user buys a handset and now wants to get service. The user first places 
a wireless call to the regional service providers (one of the calls allowed by the 
unprovisioned handset) using its handset id number. The service operator comes 
on-line and requests authorizing information (e.g. credit card information) which 
the user provides. Then the network downloads to the phone its long-lived key, 
telephone number, and other parameters. Now the mobile user has service and 
can immediately make calls if so desired. This entire exchange happens securely, 
so that others cannot eavesdrop on the authorizing information nor can they 
successfully impersonate as the network to the user or impersonate as the user 
to the network. 

1.2 North American OTASP Proposal 

Currently, the North American cellular standard specifies an OTASP protocol 
^3 using Difhe-Hellman key agreement. First the network transfers a 512 bit 
prime p and a generator g. Then a Difhe Heilman key exchange occurs as the 
mobile and the network pick their respective random numbers Rm and Rn and 
calculate and exchange the exponentials mod p and mod p. Each raise 
the exponential by their secret random exponent to form niodp. The 

64 least significant bits of niodp serve as the long-lived key, called the 

A-key. 

The Difhe Heilman key exchange is secure against passive attacks. Further- 
more, active attacks may be difficult to mount on the radio interface if transmis- 
sion of messages need to be blocked or substituted, thus offering some “practical” 
security against active attacks. For our discussion, we sidestep the radio issues 
by assuming the existence of active attackers. Below we list some limitations of 
the proposal. 

Limitations 

1. The Hrst problem with this use of the Diffie-Hellman key exchange is that 
it is unauthenticated and susceptible to a man-in-the-middle attack. An 
attacker can impersonate the network and then in turn impersonate the 
user to the network. This way the attacker can select and know the A- 
key as it relays messages between the user and the network to satisfy the 
authorization requirements. The man in the middle attack has to be carried 
on the voice channel also. 
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2. The parameters g and p are unverified which means that an attacker can 
replace them, for example, with a prime p' such that p' — 1 has small factors 
which will allow the attacker to solve the discrete log and recover Rm from 
the exponential mod p' sent by the mobile. This is another way for the 
attacker to discover the A-key on the mobile side while performing a man 
in the middle attack. A more powerful attack, which we describe next, is 
thwarted by a good choice made by the protocol. Lets say the attacker sends 
some g^ mod p to the network which forms g^^^ mod p. The attacker now 
has to send some y to the mobile such that mod p' = g^^^ mod p. Since 
Rm has been recovered by the attacker from g^^ mod p', this equation may 
be solvable for y. If so, the attacker now knows the A-key shared by the 
handset and the network. This attack is more powerful than the previous 
one because the same A-key, known to the attacker, would be residing in the 
mobile and the network. Fortunately, this attack is not possible because the 
protocol requires the mobile to receive the networks exponential, g^^ mod p, 
along with g and p before it will send its exponential g^’^ mod p. 

3. There are no provisions to check for the equivalent of weak keys for Difhe 
Heilman. For example if the attacker substitutes 1 or -1 as the exponentials 
exchanged on both sides then the Difhe Heilman key will be 1 or -1 and known 
or easily guessable to the attacker. Note that this attack does not require 
the attacker to be in the middle encrypting and decrypting the transmission 
under different keys. Once 1 or -1 has been substituted then the attacker 
only has to eavesdrop. Also, importantly the same Difhe Heilman key, 1 or 
-1, is agreed upon by both the handset and the network with probability 
i. If the prime p is not a safe prime then other values are also possible; as 
a precaution all of them should be checked, and if the Difhe Heilman key 
matches any of them then the session should be aborted by the network. 

4. We present another attack which is undetectable by the network, unlike the 
previous attack, yet the same known A-key will reside in the network and 
the handset. The attacker performs a Difhe Heilman key exchange with the 
network and agrees on a 512 bit key, K. The attacker now wants the mobile 
to have the same 512 bit key, K. The attacker, pretending to be the network, 
sends to the mobile {g,K +1,K) instead of (p,p, g^^ mod p). The handset 
calculates mod {K + 1) as the mobile’s key. This key will equal one if 
Rm is even and the key will equal K if Rm is odd. Thus with probability i 
the attacker has forced the network and the handset to have the same 512 
bit key, K and thus the same 64 bit A-key. 

5. The protocol is a key agreement protocol which is normally fine, but a key 

distribution protocol where the network creates a well chosen and random 
(i.e. unpredictable) A-key and distributes it to the handset would have been 
preferable in this situation. This may allow the network to associate the 
A-key of a handset in some deterministic manner which is good for its key 
storage and management procedures. This can be done via the use of pseudo 
random functions (PRFs) which is indexed by a master key and maps hand- 
set / phone number into 64 bit A- keys Thus instead of storing thousands 

and perhaps millions of keys and protecting them, the network can store 
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just one master key and use the PRF to recover a particular A- key associ- 
ated with the handset’s number. This master key and the related algorithm 
can be performed in a protected device, thus the A-keys are never seen by 
anyone. 

6. The size of the prime, 512 bits, may be too short. 

The above attacks should not be surprising once it is decided that an unau- 
thenticated Difhe Heilman key exchange will be used. In fact one can make a 
general statement that any unauthenticated key exchange will reveal the au- 
thorizing information (e.g. credit card number) to an active adversary. When a 
mobile requests an OTASP session, the active attacker blocks the session and, 
pretending to be the network, carries an OTASP conversation with the mobile 
user. Once the user reveals the credit card number, the attacker terminates the 
session and can use the credit card number to get service on other mobile phones 
or use it for other fraudulent purposes. Thus the security in such protocols lies 
in the practical difficulty of implementing active attacks. Despite its limitations, 
this is an interesting application of the Diffie Heilman key agreement protocol 
and makes service provisioning much more convenient. 



1.3 Carroll-Fhankel-Tsiounis Key Distribution Proposal 

This is a key distribution proposal which has the benefit of allowing the network 
to randomly choose strong keys from the space of A-keys. The Carroll-Frankel- 
Tsiounis (CFT) proposal also uses Rabin to speed up computation as did the 
Beller-Chang-Yacobi Q protocol and also assumes that each handset possesses 
the public key of a certificate authority (CA). However, it interestingly differs 
from other protocols in its extensive use of unforgeable signatures and encryp- 
tions which are semantically secure or plaintext aware. The protocol is as follows: 

1. The network sends the mobile its public encryption key signed by the CA 
(unforgeable signatures are used here). 

2. The mobile verifies the network’s public encryption key and then generates a 
random session key SK and a random pad AP. It encrypts both the SK and 
AP using the network’s public encryption key and sends it to the network 
(the encryption here is semantically secure). 

3. The network recovers the SK and the AP and uses the SK to perform sym- 
metric encryption of the A-key and the AP which it sends to the mobile 
(symmetric encryption here is plaintext aware). 

4. The mobile verifies the AP in the decryption; the handset and the network 
now both possess the A-key which is used to derive the voice encryption key 
and set up an encrypted voice channel. 

5. At this point the operator requests authorizing information (e.g. credit card 
information) from the user. If the user furnishes the information then the 
user has been authenticated to the network and service will be provided in 
the future. 
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We make some cautionary observations that the CFT protocol should not 
be viewed as a ’’solution to all problems,” but the components that surround it 
should also be designed within the correct security model. We provide specific 
examples below: 

The CFT protocol assumes that a unique A-key is generated for every OTASP 
attempt by the handset. If the A-key generation process does not guarantee this 
then the overall protocol will be insecure; as an example the same A-key may be 
generated for the same handset/phone number combination which would allow 
this attack: a handset uses its serial number and phone number (if assigned) 
to access the network for OTASP. At this point the attacker blocks the access. 
Instead the attacker picks a random session key SK and a random pad AP and 
sends them to the network using the blocked handset’s serial/phone number. 
The network responds with the encrypted A-key which the attacker retrieves 
and aborts the connection. Now the attacker is in possession of the A-key for 
that handset. If the legitimate handset again accesses the network with its own 
session key and pad, the network will again transport the same A-key to the 
handset encrypting it with the session key. Now the handset will have the A-key 
and the user, on the encrypted voice channel, will give authorizing information 
thus successfully completing service provisioning. Unfortunately, the attacker 
already has the A-key and he also can use it later to make fraudulent calls. Thus 
in the CFT model, one should not directly use a PRF to associate an A-key 
to a handset/phone number without guaranteeing uniqueness at every OTASP 
attempt. Later, we will show that it is not necessary to have this restriction in 
a key distribution protocol. 

Secondly, a mild form of denial of service attack is possible if the hand- 
set/phone numbers are not used in the cryptographic operations of the CFT 
protocol or the later verification stages. Actually, there are two forms of denial 
of service attacks. In the first one, the customer’s credit card is not charged and 
OTASP is unsuccessful, while in the second one, the customer’s credit card is 
charged and OTASP is unsuccessful. There is little that can be done to pre- 
vent the first form of the attack in any protocol, but we would like to assure 
the customers that if their credit cards are charged then OTASP will be suc- 
cessful and their service will be activated. An attacker can perform the second 
form of attack by substituting other numbers in place of the handset’s true id 
number and user’s phone number throughout the protocol. The protocol will 
be completed and credit cards charged, but the network will not have activated 
the true handset /phone number. Verification steps done after the CFT protocol 
can also be satisfied as long as they do not use handset/phone numbers as part 
of the cryptographic operations. Thus later attempt by the user to access the 
system will be rejected. This attack is possible because the handset id number 
used in communication is not part of the public key encryption of the SK and 
the AP sent by the mobile to the network. If it was then the network can know 
that no attacker could have substituted another false handset/phone number. 
Fortunately, the current North American does perform cryptographic operations 
using the handset/phone numbers in the verification steps following the A-key 
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agreement. So the attack would be detectable at this time, and the credit cards 
should not be charged until after verification steps. 

Finally the CFT protocol tries to minimize the impact due to lack of strong 
random sources and points out that if independent sources of randomness are 
used then a weak source for the SK is not problematic as long as the other 
AP is strongly random. We caution that this is not true if the CFT protocol is 
embedded in the current North American standard because candidates for the 
SK can be used to decrypt the symmetric encryption and give candidates for 
the A-key. These candidates can be further verified because the A-key is used 
to derive another session key called the SSD which is further used to answer 
challenges. An attacker can see if a candidate A-key results in the same response 
to the challenge as actually seen in a session. If not then the next candidate is 
tried until the true A-key is recovered. However, these off-line verifications of 
SK guesses would not be possible if the challenge/response protocol following 
the CFT protocol was replaced by a more secure authentication protocol (i.e. 
zero knowledge) which does not reveal information about the A-key. Neverthe- 
less, the weak source for SK must have some minimum strength so that on-line 
verifications of A-key are not practical. Assume an attacker has good guesses for 
SK and hence, guesses for A-key. If these guesses are few then the attacker can 
use each A-key guess to establish a session (e.g. call origination). The attacker 
will be unsuccessful on all tries except the one with the true A-key. 



2 Classification of Shared Knowledge 

Different OTASP schemes are possible depending upon what shared knowledge 
is possessed by the mobile user/handset and the network. At one extreme, one 
can assume that the mobile handset and the network each have a public key and 
both keys are known to the handset and network. Similarly, for the symmetric 
cryptography case, we can assume that the mobile handset and the network both 
share a strong secret (64 bits or more). Next we can assume that the mobile 
handset possesses the public key of a CA and thus indirectly has knowledge of 
the network’s public key. At the next level we can assume that the mobile user 
and the network only share a weak secret. Finally, the weakest assumption is 
that the network can only verify a secret relayed to it by the mobile user. That 
is the network does not even share a secret with the user, but can only verify it. 

Orthogonal to this is the availability of a secure channel (land line phone) 
which can aid in OTASP. Although, we describe the various schemes with respect 
to the wireless environment, they can be used in other insecure environments 
(e.g. internet) where there is a need to remotely provision. For different situa- 
tions, different assumptions may be valid. In some environment there may be an 
agreement on the certificate authority and hence on the root key to be pre-stored 
in every device. However, for other environments no easy agreement on a CA 
and its procedures may be possible and hence, the other non-CA based schemes 
need to be used. 
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3 Secure Password Scheme 

We want to review in this section methods to use weak secrets (e.g. passwords) 
for authentication and key agreement. These methods will be used repeatedly. 
Standard protocols used in symmetric key authentication and key agreement 
are susceptible to off-line dictionary attacks. A typical protocol consists of a 
challenge R sent from user A to user B. User B responds with a function fp{R), 
using the challenge R and the password P. An eavesdropper hearing the pair R 
and fp{R) can use this information to verify guesses. Users do not pick passwords 
randomly from the entire 2"' possible space, but instead use memorable and 
meaningful passwords. Quite often their passwords are picked from names and 
words in dictionaries. So an attacker can perform off-line dictionary attack by 
picking each word, P’, in the dictionary and forming fp'{R) and seeing if it 
matches fp{R). If it does not then the attacker tries the next guess until it finds 
a match which reveals the password P. No matter how complicated the protocol, 
if it only uses symmetric cryptography then it will be susceptible to a variation 
of the off-line dictionary attack. 

3.1 Review of Secure Password Protocols 

There has been some advance towards password protocols resistant to off-line 
dictionary attacks ^3, Q, and Q. Lomas et.al ^3, were the first to propose a 3 
party protocol using passwords which were protected against dictionary attacks. 
Bellovin and Merritt Q made similar protocols called Encrypted Key Exchange 
(EKE) for two party authentication and key exchange using passwords and still 
had protection against dictionary attacks. 



Mobile Network 




Fig. 1. The DifRe Heilman Encrypted Key Exchange (DH-EKE) 
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’ I I showed that information leakage is possible in EKE which allows one to 
retrieve the passwords after hearing some exchanges. Furthermore presented 
attacks against variations of EKE and Gong et.al protocols where varying or 
unverified parameters were used. The fixing or verifications of parameters seems 
necessary. RSA parameters are unverifiable and hence must be fixed. However, 
Elgamal and DH (Difhe Heilman) parameters like g and p are verifiable and 
hence can either be fixed or changing but verified during each exchange. 

We now describe Bellovin and Merritt’s secure password protocol, called En- 
crypted Key Exchange(EKE) based on the Difhe-Hellman key exchange (DH- 
EKE). The mobile and the network (see figure 1) have agreed upon a g and 
a p, here we assume that the g and the p are fixed. The mobile picks a ran- 
dom number Rm and calculates g^’^ mod p. The mobile further encrypts the 
value using the password P to get P{g^^ modp) and sends it to the network. 
The network similarly calculates and sends P{g^^ modp) to the mobile. Both 
decrypt the messages and exponentiate with their respective secret random ex- 
ponents to form the Difhe Heilman key, g^’^ mod p which serves as the session 
key. The session key is then authenticated by random challenges to prevent re- 
play and other attacks. Since the Difhe Heilman exponential g^'^ and g^’^ are 
encrypted using the password P, a man-in-the middle attack is not possible, 
furthermore, the passwords prove authentication because without knowing the 
passwords there would not be an agreement on the Difhe-Hellman keys. How 
does this stop the dictionary attacks? Well, because guessing the password P’ 
allows one to recover the guesses for the exponentials g^’^ and g^'^ , but the 
attacker cannot form the Difhe Heilman key g^'^ and verify the guess, hence 
off-line guesses from the dictionary cannot be verihed. 

Encrypting g^ mod p with a symmetric cipher can leak information, and 
Bellovin and Merritt propose a random padding method, however, this also leaks 
information allowing an attacker to recover the password. Some countermeasures 
are available Q, ^ 3 . 

4 Secure Channel (Land Line) Available 

Assuming a land line connection is available we present two different schemes for 
OTASP which are secure against man in the middle attacks. The hrst method of 
performing a secure over the air key exchange, uses a secure password protocol 
(see Figure 2). First the user contacts the operator over a land phone line (First 
two lines in the hgure refer to a land line connection and the third line refers 
to a wireless connection). The land phone line serves as our authenticated and 
private channel. The operator asks questions to verify that the user is autho- 
rized. Then the user is instructed to power on the handset and enter a 4 digit 
number provided by the operator. The network then uses this 4 digit number 
as the password to perform a secure password Difhe-Hellman key exchange as 
described in the previous section. Once a temporary DH key has been agreed 
then the channel can be encrypted and the messages authenticated using sym- 
metric techniques as used for executing a secure session. This way the A- key and 
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Fig. 2. Over the Air Key Exchange - Using Land Lines with a Password Protocol 



other parameters can be securely and privately downloaded to the handset. As 
an example, the telephone number can be privately downloaded and thus the 
anonymity of the user can be kept, including other parameters. 

The second method of doing a secure over the air key exchange is to have the 
mobile handset display a string and have the user read it to the operator, this 
way the man in the middle is avoided. The protocol is described in figure 3 where 
the first and last passes occur with the operator on the land lines while the 2nd 
and 3rd occur on the wireless connection between the handset and the network. 
After the operator has authorizing information from the mobile user, the mobile 
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Fig. 3. An Over the Air Key Exchange Susceptible to Birthday Attacks 
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is turned on and in steps 2 and 3 the mobile and the network send 5^“, and 
to each other. The operator and the user verbally verify that some of the 
digits of h{g^'^ , are the same, where h is a cryptographic hash. If verified 
then the network and mobile use as the A-key or else the network 

can initiate security violation procedures. This is good not just for wireless, but 
for any kind of remote update of the parameter with human verification. 

There is a birthday attack which one has to be careful about. Lets say that 
we decide that 64 bits will be read back by the mobile user to the operator. It 
would seem that the attacker would have to try 2®^ different values so that the 
hash of those values h{g^’^ ,g^^ ) and h{g^’^ are the same, a necessary 

condition to carry the man-in-the-middle attack. So the attacker tries all the 
values for R^' and Rm' such that their hashes are the same. Although naive 
counting would suggest that about 2®"^ values need to be tried to get a significant 
chance of collision, we can by birthday arguments show that about 2®^ values 
need to be tried to find a collision with probability near 1/2. To slow things 
down one can put q£ hash: h{g ^’^ , g^N This means 

the attacker has to do exponentiation along with hashes which makes things 
slower. If the user is asked to read back alphabetic characters then about 14 
letters are needed to cover 64 bits. If alphanumerics are used then about 12 are 
enough. However 2®^ complexity for an attack even in real time may not be 
enough security and if 2®^ complexity is required then 128 bits must be read 
back by the user which is about 24 alphanumerics. There is a simpler method of 
making the protocol in Figure 3 resistant to the birthday attack by introducing 
a restriction on the sequence of the exchange of the Diffie Heilman exponentials. 
In particular, if we insist that the mobile will not send its (7^“ until it receives 
gRN from the network then the birthday attack is foiled. The man in the middle 
attacker was previously able to see both the exponentials g^’^ and g^^ and thus 
it was able to exploit the birthday attack. Now the attacker has to commit an 
exponential to the mobile before it will see the mobile’s exponential (7^“ thus 
reducing one degree of freedom for the attacker. 

We present another version of the protocol with one more round which is not 
susceptible to the birthday attack, and furthermore is resistant to searches for 
consistent exponentials by a man in the middle attacker. Thus verification of a 
smaller string is sufficient. The protocol is described in figure Figure 4 where the 
first and the final step are over the land voice link while the middle three steps 
occur over the air link. On the wireless link, first the network sends the hash of 
its exponential, the user then sends its exponential, 77^“ , and finally the network 
sends its exponential 77^" . The user first verifies the hash of the exponential sent 
by the network. Then both the user and the network calculate h{g^'^ , h{g^^)) 
and verbally verify its first 4 digits. A man-in-the-middle attacker cannot use 
birthday type attacks or do searches for consistent exponentials because as a 
network he has to commit to the exponential he is using (via the hash) before he 
sees the users exponential. Similarly, the attacker as a user has to commit to the 
exponential before the value of the networks exponential, associated with the 
hash, is revealed. Thus we need to verify a much smaller string (e.g. 4 digits). 
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Fig. 4. An Over the Air Key Exchange Resistant to Birthday Attacks 



There is an existing protocol, called VPl B which is related to this protocol, 
but it is used for agreeing on a session key whereas we use our protocol to 
update parameters via an authenticated link. Secondly, VPl is a 4 round protocol 
whereas we are a 3 round protocol on the wireless link. VPl first has both the user 
and the network hash their exponentials and exchange them. Then the actual 
exponentials are exchanged. Finally, a hash of the function of the exponentials 
is calculated and 6 digits are verified by voice. Our protocol is more efficient in 
terms of the number of rounds and the messages exchanged. Since VPl does not 
use an authenticated link, it ultimately does not protect against man-in-middle 
attacks, unless the end parties know each other’s voice. Our method can be used 
for other environments. Also other variations can be built using different public 
key schemes. 

The safe prime p and the generator g were assumed to be fixed and prestored 
in the handset. However, if that is not the case then the attacker can replace 
g and p with g' and p' which may allow it to calculate the discrete logarithm 
efficiently. If g and p are also sent over the air then they could also be used as 
part of the hash calculation, h{g,p, g ^^ , g^^ ) in order to detect the substitution 
of g and p by the attacker. 



4.1 Man-in-the-Middle on Voice Channel 

Although we have described the last two protocols as requiring a land line voice 
link one can execute the protocols without it. Thus a handset can be used to 
make the voice call itself to the network operator and a DifRe Heilman performed 
on the control channel. No man in the middle attack is possible on the control 
channel as described above. However, a man-in-the-middle attack is possible if 
it is also performed on the voice channel. First the attacker pretends to be the 
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network to the user and gets the authorizing information. Secondly the attacker 
pretends to be the user to the network. If this is happening in real time then 
all the information from one conversation is relayed to the other (assume there 
are two attackers working in concert). Then on the control channel the man-in- 
the-middle attack can proceed because when it comes time to verify the digits 
on the voice link, the two attackers will say the different digits to the network 
and the user respectively. Although we have described this as happening in real 
time, it could happen that the attacker gets authorizing information from the 
user and then at a later time calls the network and gives it the information to 
get the A- key and other parameters. 

5 Mobile and Network Share Strong Keys 

If the handset and the network already share strong secrets either in form of 
each other’s public keys or strong secret keys, then well known protocols can be 
used to exchange or update parameters. [[] is an example of a protocol using 
public keys to establish session keys. [~] is an example of secure protocols used 
with symmetric cryptography to establish session keys. The session keys can be 
used to encrypt a session and authorizing information can be provided to start 
service and the parameters can be provisioned into the handset. We are trying 
to get strong secrets agreed upon by both sides, but since we don’t possess them 
initially we will try to bootstrap from weak secrets to strong secrets. 

6 Mobile Has CA’s Public Key 

Manufactures can install a CA’s public key in all the phones. This is much easier 
than installing a unique public key for each handset. The assumption is the one 
used in the CFT protocol and also used in SSL. Unlike the CFT protocol, we will 
use the handset id as part of our encryption thus blocking the denial of service 
attack. Furthermore, we want our protocol to allow a network to be able to pick 
a well chosen A- key for a specific handset/phone number and try to distribute 
that A-key to the handset repeatedly. The CFT protocol was not able to do 
this because it revealed its hand before it was necessary. That is, it revealed the 
A-key before authorization was given on the voice channel. In fact our protocol 
does not require a voice connection to transfer the authorizing (e.g. credit card) 
information from the mobile user to the network. The user can be prompted on 
the handset and the user can enter the information on the handset. We reveal 
the A-key only if the authorization step has been successfully completed. Here 
are the steps of our protocol: 

1. The handset requests OTASP and identifies itself via a handset id number 
or a telephone number if already assigned. 

2. The network sends its public key, signed by the CA, to the handset (unforge- 
able signatures). 
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3. The mobile generates a random session key SK and encrypts the handset id 
and the session key SK with the network’s public key and sends it to the net- 
work (using probabilistic encryption | or semantically secure encryption). 

4. The network decrypts and then uses session key SK to initiate a authen- 
ticated and encrypted session. Voice and messages are both encrypted and 
message authentication is provided using a message authentication key and 
a encryption key derived from the session key, SK. 

5. The user provides the credit card information to the network. 

6. If the credit card information is verified then the network distributes the 
A- key to the handset. 

Note that unlike the CFT protocol, the entire OTASP has been encrypted 
and authenticated using the session key derived from the SK rather than the 
session key derived from the A-key. 



7 Mobile User and Network Share Weak Secret 

It may be that the user and the service provider have had a previous contact 
and now share some personal information about the user. The user might have 
already interacted with the service provider or the user may be an existing land 
line customer of the service provider and now wants to get mobile service. It is 
not clear, what the personal information is that the network and the user share. 
Is it the mothers maiden name? The last 4 digits of the social security number 
(SS#) or zipcode? Obviously there must be some information that the operator 
has which it uses to authenticate the user on the voice link. Assume its the last 
4 digits of the SS#. If so then we can perform a secure password protocol using 
the last 4 digits of the SS#. First the user enters the last 4 digits of the SS# 
into the handset, and then using the handset itself a secure password protocol 
is executed and the A-key and other parameters are updated in the handset. 
Note all this happens without a voice call being placed either on the land line or 
on the handset. In a sense, a temporary password (last 4 digits of SS#) is used 
and then other parameters are updated. The protocol is the same as the one in 
Figure 3 except all the communications take place over the wireless phone. There 
is no need for a real time operator to be involved, although it is not precluded 
from the protocol. 

Perhaps credit card information can serve as the shared secret. Since the 
network operators have access to credit card information about all users, this 
information can be used as a shared secret. If an operator using the name of a 
person and type of card (e.g. citibank visa), can know the credit card information, 
then that can serve as the shared secret. This may not be possible if the operator 
can only verify the credit card number, but does not know the credit card number 
from just the name of the user. When the user contacts the network operator, 
the operator will ask for the user’s name and type of credit card. Then the 
user is prompted to enter the card number which will be used as the weak 
shared secret to initiate a secure password protocol. If successful then the A-key 
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and other parameters can be distributed to the handset over an encrypted and 
authenticated session. 

8 Network Can Verify Mobile User’s Secret 

In the worst case there is no shared secret, however, the mobile user has a secret 
(name + credit card number + expiration date) which the network can verify. 
The network does not know this information ahead of time, but can only verify 
it. Now we cannot use the previous techniques and hence come full circle to using 
something very similar to the current North American OTASP proposal. There 
does not seem to be any cryptographic techniques which will protect against 
active attackers in this situation. 

We will also perform an unauthenticated Difhe-Hellman key exchange or an- 
other unauthenticated public key (e.g. Rabin) based key distribution. However, 
we will do it in such a way that performing a man in the middle attack is very 
difficult, involving service denial to much of the service region for extended pe- 
riods of time. We do this by disguising an OTASP key exchange as a normal 
system access (e.g. call origination) followed by an encrypted and authenticated 
session. In order to do this, a random ID/telephone number for the handset 
should be used to make a call origination. The network, when it sees the random 
ID/ telephone number will know that this is not a legitimate number. The net- 
work knows that this could happen either because there was error in transmission 
or some one is trying to initiate OTASP. The network then continues to pretend 
its a normal call, and sends random bits to the handset and the handset also 
sends random bits to the network. However the first 1024 bits of the disguised 
call can be exponentials mod p and mod p. The key is derived and used 
to encrypt the rest of the session after some predetermined time, say after 10 
seconds of data. Then the call should be placed to the operator in the network 
and the mobile user should relay the secret credit card information which the 
network will verify. If the information is verified then A-key can be transported 
to the handset along with other parameters. 

If the exchange is disguised well then the only way for the attacker to act 
as man in the middle is to try to do so with most calls which are going on, 
hoping that it will find one that is truly an OTASP call. To have any significant 
probability of finding such calls it will have to be blocking most calls because 
an OTASP call is a rare call, once or few times in the lifetime of a phone. A 
call origination on the other hand is very frequent, thus the cost of the attack is 
expensive. So if such kind of blocking or denial of service occurs then it should 
become easier to find the attacker and becomes all the more important to find 
the source and put an end to the blocking. The security of such a protocol is 
not cryptographic but practical. We make the practical assumption that the 
adversary is not so powerful as to block most calls and still go undetected. 

The security of this strengthened unauthenticated DifRe Heilman is analo- 
gous to the strength of passwords schemes in withstanding on-line attacks. An 
attacker guessing at the correct password in one session has a low probability of 
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guessing it correctly, but guessing over thousands of sessions the attacker has a 
high probability of recovering a password for some user. Similarly, an attacker 
performing a man in the middle attack on a session has a low probability of 
success, but over thousands of sessions the probability of success is high. 

The Difhe Heilman exchange is only strengthened if the OTASP call is indis- 
tinguishable from a normal call origination and this is very tricky to guarantee. 
For example, if g and p were sent during the OTASP call then the call is distin- 
guishable because an attacker can check if p is a prime. If not then the attacker 
knows that this is a normal call origination and there is no need to perform a 
man in the middle; we had assumed that g and p are fixed and known. If we do 
want to send g and p as part of the session then outlines some methods to 
do this safely. A general method of sending primes is to send the random string 
used in picking a prime rather than sending the actual prime. At the other end, 
the same random string is used to pick the same prime or a safe prime. To trans- 
fer 5 , a random string is sent to the other party who checks if the string is a 
generator of p. If so then it is used as the g else the increment is checked until 
a generator is found. 



9 Conclusion 

We started by examining the proposals for OTASP and their limitations. We 
then provided over the air service provisioning methods under various assump- 
tions about knowledge shared between the mobile user/handset and the network. 
We started from strong assumptions like the knowledge of strong keys and moved 
to the weakest assumption, that the mobile and network do not share any secrets 
but if the mobile relays its secret the network can verify it. For this weak as- 
sumption we could only provide a practically secure scheme assuming a limited 
adversary. If agreement on public keys is desired, we can further have the hand- 
set and network generate their respective private and public keys and exchange 
them. Thus we are able to bootstrap from a small secret to a strong public key. 
The table (Figure 5) organizes the various protocols according to the different 
assumptions. The left column of the table lists the various assumptions about 
the secrets shared between the network and the handset. The top row of the 
table lists the various 4 assumptions about availability of a land line, or the 
availability of pre-stored and public constants like the CA’s public encryption 
key or the g and p, or the possibility that nothing is pre-stored. 

If a strong secret is shared then a standard two party session key agreement 
protocol can be used for all 4 assumptions. If a weak secret is shared between 
the handset and the network then the schemes of section 4 can be used with the 
availability of land lines; if the CA’s public encryption key is pre-stored in the 
handset then the protocol of section 6 can be used; if g and p are pre-stored then 
the secure password scheme can be used and even when g and p are not pre- 
stored the secure password scheme can be used as long as the g and p are verified. 
Finally, the weakest assumptions on secrets is that the mobile user has a secret, 
but the network does not share it and can only verify it when presented by the 
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Fig. 5. Summary Table 



mobile user. In this case, the availability of land lines means that the protocols 
from section 4 can be used; if the CA’s public encryption key is pre-stored then 
the protocol of section 6 can be used here also; if g and p are pre-stored then 
one has to resort to the strengthened version (section 8) of the unauthenticated 
Diffie Heilman key exchange which makes the OTASP call indistinguishable from 
a normal call; even if g and p are not pre-stored then one can use the protocol 
in section 8 as long as g and p are carefully transferred as outlined in section 8. 
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Abstract. The previously best attack known on elliptic curve cryp- 
tosystems used in practice was the parallel collision search based on 
Pollard’s p-method. The complexity of this attack is the square root of 
the prime order of the generating point used. For arbitrary curves, typi- 
cally defined over GF{p) or GF{2’^), the attack time can be reduced by 
a factor or \/2, a small improvement. For subfield curves, those defined 
over GF(2®‘*) with coefficients defining the curve restricted to GF(2®), 
the attack time can be reduced by a factor of -\/M. In particular for 
curves over GF(2"*) with coefficients in GF{2), called anomalous binary 
curves or Koblitz curves, the attack time can be reduced by a factor 
of \/2m. These curves have structure which allows faster cryptosystem 
computations. Unfortunately, this structure also helps the attacker. In 
an example, the time required to compute an elliptic curve logarithm 
on an anomalous binary curve over GF(2^®^) is reduced from 2®^ to 2^^ 
elliptic curve operations. 



1 Introduction 

Public-key cryptography based on elliptic curves over finite fields was proposed 
by Miller 0 and Koblitz Q in 1985. Elliptic curves over finite fields have been 
used to implement the Difhe-Hellman key passing scheme PQ and also the 
elliptic curve variant of the Digital Signature Algorithm The security of 

these cryptosystems relies on the difficulty of solving the elliptic curve discrete 
logarithm problem. If P is a point with order n on an elliptic curve, and Q is some 
other point on the same curve, then the elliptic curve discrete logarithm problem 
is to determine an I such that Q = IP and 0 < / < n — 1 if such an I exists. 
If this problem can be solved efficiently, then elliptic curve based cryptosystems 
can be broken efficiently. 

There are known classes of elliptic curves in which solving the discrete loga- 
rithm problem is (relatively) easy. These classes include supersingular curves B 
and anomalous curves The elliptic curve discrete logarithm problem 

for supersingular curves can be reduced to the discrete logarithm problem in a 
small finite extension of the underlying finite field. The discrete logarithm prob- 
lem in the finite field can then be solved in subexponential time. Anomalous 
curves are curves defined over the field GF{p) and have exactly p points. The 
elliptic curve discrete logarithm problem for anomalous curves can be solved in 
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0{lnp) operations. Both supersingular and anomalous curves are easily identified 
and excluded from use in cryptographic operations. 

The best attack known on the general elliptic curve discrete logarithm prob- 
lem is parallel collision search h^'| based on Pollard’s p algorithm Q which has 
running time proportional to the square root of the largest prime factor dividing 
the curve order. This method works for any cyclic group and does not make 
use of any additional structure present in elliptic curve groups. We show how 
this method can be improved for any elliptic curve logarithm computation by 
exploiting the fact that the negative of a point can be computed very rapidly. 

Certain classes of elliptic curves have been proposed for use in cryptogra- 
phy because of their ability to provide efficiencies in implementation. Among 
these have been subfield curves and anomalous binary or Koblitz curves 
Using the Frobenius endomorphism, we show that these curves also allow a fur- 
ther speed-up for the parallel collision search algorithm and therefore provide 
less security than was originally thought. This is the first time that the extra 
structure provided by these curves has actually been used to attack the cryp- 
tosystems upon which they are based. Independent work in this area has also 
been performed by Robert Gallant, Robert Lambert and Scott Vanstone ^ and 
by Robert Harley, who has used his results to solve the ECC2K-95 Certicom 
challenge problem. 



2 Background 

This section will provide the necessary background material on various properties 
of elliptic curves and will also describe the parallel collision search method for 
computing discrete logarithms. 



2.1 Elliptic Curves Over GF{p) 

Let GF{p) be a finite field of characteristic p 2,3, and let a,b G GF{p) satisfy 
the inequality 4a^-|-276^ ^ 0. An elliptic curve, E(a,b){GF{p)), is defined as the 
set of points {x,y) € GF{p) x GF{p) which satisfy the equation 

+ ax+ b, 

together with a special point, O, called the point at infinity. These points form an 
Abelian group under a well-defined addition operation which we now describe. 

Let E^a,b){GF{p)) be an elliptic curve and let P and Q be two points on 
E{a,b){GE[p)). HP = O, then — P = O and P+Q — Q+P — Q. Let P = {xi,y\) 
and Q = ( 12 , 2 / 2 )- Then — P = (a;i, —yi) and P -I- (— P) = O.li Q ^ —P then 
P+Q= {X 3 ,y 3 ) where 



X3 = — Xi — X2 

V3 = p{xi - X3) - yi. 
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and 



= 



V2 - yi 

X2 — Xi 



iiP^Q 



3x1 + ® 

2 yi 



ifP = Q. 



2.2 Elliptic Curves Over GE(2"*) 

We now consider non-supersingular elliptic curves defined over fields of charac- 
teristic 2. Let GF{2'^) be such a field for some m > 1. Then a non-supersingular 
elliptic curve is defined to be the set of solutions {x, y) G GF(2"*) x GF{2"^) to 
the equation 

+ xy = x^ + ax^ + b 

where a,b G GF(2"*) and b ^ 0, together with the point on the curve at infinity, 
O. We denote this elliptic curve by E(a,b){GF{2'^)) or (when the context is 
understood) E. 

The points on an elliptic curve form an Abelian group under a well defined 
group operation. The identity of the group operation is the point O. For P = 
{xi,yi) a point on the curve, we define —P to be {xi,yi + a:i), so P -I- (— P) = 
(— P) -I- P = G. Now suppose P and Q are not O, and P ^ —Q. Let P be as 
above and Q = ( 0 : 2 , 3 / 2 ), then P -|- Q = ( 2 ^ 3 , 3/3), where 

x^ = + y + X\ + X2 + a 

3/3 = y{xi + X 3 ) + X 3 + yi, 

and 






3/2 + 3/1 . 



X2 + a:i 



if P^Q 



xj + y 

Xl 



if P = Q. 



2.3 Anomalous Binary and Subfield Curves 

Anomalous binary curves (also known as Koblitz curves) are elliptic curves over 
GP(2") that have coefficients a and b either 0 or 1. Since it is required that 
6 0, they must be defined by either the equation 



y^ + xy = x^ + 1 



or the equation 

y^ + xy = x^ + x^ + 1. 



Since these curves allow very efficient implementations of certain elliptic curve 
cryptosystems, they have been particularly attractive to implementors of these 
schemes ^^ 9 . Anomalous binary curves are just a special case of subfield curves 
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which have also been proposed for use in elliptic curve cryptography because they 
also give efficient implementations. 

If m = ec? for e, d £ !S>o, then GF(2®) C GF(2'"). Using underlying fields 
of this type provide very efficient implementations If a and b are actually 

elements of GF(2®), then we say that if is a subfield curve. Notice in this case 
that E^a,b)iGF{2^)) c G(,.„)(GF(2-)). 

If e is small, so that the number of points in E(a,b){GF{2‘^)) can be eas- 
ily counted, there is an easy way to determine the number of points in if(a,h) 
(GF(2'")). Denote by ffE the number of points in E. Then it is well known that 
ffE(^a,b){GF(2^)) = 2® -I- 1 — t for some t < 2\f^. The value t is known as the 
trace of the curve. If a and /3 are the two roots of the equation — tX + 2® = 0, 
then ffE(^a,b){GF{2'^)) = 2"* + 1 — a'^ — j3'^. This is known as Weil’s Theorem. 

2.4 The Probenius Endomorphism 

An interesting property of anomalous binary curves is that if P = (x, y) is a 
point on the curve, then so is In fact (x^,y^) = XP for some constant 

A. We can see this in the general case of subfield curves using the Frobenius 
endomorphism . 

The Frobenius endomorphism is the function ip that takes x to x^ for all 
X £ GP(2'"). Since we are working in a field of characteristic 2, notice that 
ip{r{x)) = r{ip{x)) for all x £ GP(2"*) and any rational function r with co- 
efficients in GP(2®). If P = (x,y) is a point on the subfield curve E, define 
ip{P) = (ip{x) , ip{y)) . Also define ip{0) = O. It can be shown from the curve’s 
defining equation and the fact that (a -I- b)^ = of +b'^ for all a, 6 £ GP(2'") 
that if P £ P then ip{P) £ E. Thus if P is a subfield curve and P,Q G E, then 
iPiP + Q) = iP{P) + iPiQ). 

Now, consider a point P G E where E is a subfield curve and P has prime 
order p with not dividing fpE. By the above remarks we have pip{P) = 
ip{pP) = ip{0) — O. Hence ip{P) must also be a point of order P. Since ip{P) £ 
P, we must have ip{P) = XP for some A£!Z, l<A<p — 1. The value A is 
constant among all points in the subgroup generated by P and is known as the 
eigenvalue of the Frobenius endomorphism. 

It is known that for any point P G E, the Frobenius endomorphism satisfies 

ip^{P)-tip{P) + 2^P = 0 

where t is the trace as defined in Section 2.3. Therefore, it can also be shown 
that A is one of the roots of the quadratic congruence 

X^-tX + 2<’ = 0 (modp). 

Hence, A can be efficiently computed. 

2.5 Parallel Collision Search 

Given a point Q on an elliptic curve which is in a subgroup of order n generated 
by P, we seek I such that Q — IP. Pollard’s p method IfOl proceeds as follows. 
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Partition the points on the curve into three roughly equal size sets 81,82,83 
based on some simple rule. Define an iteration function on a point Z as follows 

( 2 Z if E’ € 5 i 
f{Z) = \ Z + PHZG82. 

[z + QiiZe83. 

Choose Ao,Bq G [l,n — 1 ] at random and compute the starting point Zq — 
AqP + BqQ. Compute the sequence Zi = f{Zo), Z2 = f{Z\), . . . keeping track 
of Ai, Bi such that Zi — AiP + BiQ. Thus, 



({2Z„2A„2B,) ifZG^i 

{Zi+i, = < {Zi + P, Ai + 1 , Bi) if 2 " G 82- 

[_ {Zi + Q, Ai, Bi -\- \) ii Z G 83. 



Note that Ai and Bi can be computed modulo n so that they do not grow out 
of control. Because the number of points on the curve is finite, the sequence of 
points must begin to repeat. Upon detection that Zi = Zj we have AiP + BiQ — 
AjP + BjQ, which gives I = mod n, unless we are very unlucky and 

Bi = Bj (mod n). 

Actually, Pollard’s function is not an optimal choice. In Q it is recom- 
mended that the points be divided into about 20 sets of equal size 81, . . ., 820 
and that the iteration function be 

{ Z + ciP + diQ AZg8i 
Z + C2P + d,2Q A Z G 82 

: : ( 1 ) 

Z P C20P + d,2oQ A Z G 820 

where the Ci and di are random integers between 1 and n — 1 . The use of this 
iteration function gives a running time very close to that expected by theoretical 
estimates. In order to make computation of the values Ai and Bi more efficient, 
we suggest that constants Cn, . . . , C20 and di, . . ., dio could be zero so that only 
one of the values Ai or Bi need to be updated at each stage. 

Pollard’s p method is inherently serial and cannot be directly parallelized over 
several processors efficiently. Parallel collision search provides a method for 
efficient parallelization. Several processors each create their own starting points 
Zq and iterate until a “distinguished point” Zd is reached. A point is considered 
distinguished if it satisfies some easily tested property such as having several 
leading zero bits. The triples {Zd, Ad, Bd) are contributed to a memory common 
to all processors. When the memory holds two triples containing the same point 
Zd, then the logarithm I can be computed as with Pollard’s p method. 

The expected number of iterations required to find the logarithm is 
The object of this paper is to reduce this number. 
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3 Faster Attacks for Arbitrary Curves 

Notice that for elliptic curves over both GF{p) and GF(2'"), given a point P = 
(a;, y) on the curve it is trivial to determine its negative. Either — P = (a;, —y) 
(in the GF{p) case) or — P = (a;, a; + y) (in the GP(2'") case). Thus, at every 
stage of the parallel collision search algorithm, both Zi and —Zi could be easily 
computed. 

We would like to reduce the size of the space that is being searched by parallel 
collision search by a factor of 2. We can do this by replacing Zi with ±Zi at each 
step in a canonical way. A simple way to do this is to choose the one that has 
smallest y coordinate when its binary representation is interpreted as an integer. 

When performing a parallel collision search, Zi, Ai and Bi should be com- 
puted as normal. However, —Zi should also be computed, and whichever one 
of Zi and —Zi has the smallest y coordinate should be taken to be Zi. If Zi 
has the smallest y coordinate, then everything progresses as normal. If —Zi has 
the smallest y coordinate then —Zi should replace Zi, —Ai should replace Ai 
and —Bi should replace Bi. Notice that the equation Zi = AiP + BiQ is still 
maintained. 

Thus, the search space for the parallel collision search is reduced to only those 
points which have a smaller y coordinate than their negative. Since exactly half 
of the points O) have this property we have reduced the search space by a 
factor of 2. Because the extra computational effort in determining which of Zi 
and —Zi to accept is negligible, the expected running time of the algorithm will 
be reduced by a factor of \/2- This improvement in attack time is valid for any 
elliptic curve. 

A technicality which affects the most obvious application of this technique 
is the appearance of trivial 2-cycles. Suppose that Zi and both belong 
to the same Sj and that in both cases after / is applied, the negative of the 
resulting point is used. This is when Zi+i = —{Zi + CjP + djQ) (say) and 
Zi +2 = —{Zi+i +CjP+djQ) = Zi. The occurrence of these 2-cycles is reduced by 
using the iteration function given in Equation (1) since it gives more choices for 
the multipliers. It does not reduce it enough so that efficient implementations are 
possible however. To reduce the occurrence of 2-cycles even further, we can use a 
look-ahead technique which proceeds as follows. Define fw{Z) = Z+CwP + d^Q. 
Suppose that Zi G Sj. Then f{Zi) = fj{Zi). Begin by computing R = ±fj{Zi), 
a candidate for If i? ^ Sj then = R. If R G Sj, then we treat Zi 

as though it were in Sj+i (where j -I- 1 is reduced modulo 20), and compute a 
new candidate R — ±/j+i(Z'i). If i? ^ then = R, otherwise continue 
trying j + 2,j + 3, — If all 20 choices fail (a very low probability event), then just 
use Zi+i = ±fj{Zi). The idea is to reduce the probability that two successive 
points will belong to the same set. Note that Zi+i still depends solely on Zi, a 
requirement for parallel collision search to work. 

This modified iteration function causes the amount of computation to in- 
crease by an expected factor of approximately a small penalty which can be 
reduced by using more than 20 cases. The occurrence of 2-cycles is not com- 
pletely eliminated, but is significantly reduced. If necessary, it can be reduced 
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further by using more than 20 cases or by looking ahead two steps instead of just 
one. Another way to deal with 2-cycles is to consider them to be distinguished 
points. 



4 Faster Attacks for Subfield Curves 

We will now describe an attack on subfield curves that again uses parallel collision 
search and will reduce the running time by a factor of \/d when considering curves 
over GF(2«'^). 

Let E(^a,b){GF{2^‘^)) be a subfield curve with a,b G GF(2®) and let P be a 
point on the curve such that not both coordinates are from a proper subfield of 
GP(2«'^). In other words P G P(a,b)GP(2®'^), but P ^ P(a,h)(GP(2®^)) for any 
f, I < f < d — 1. Let P have prime order p such that does not divide the 
order of the curve and let d be odd. These conditions are not restrictive since 
most elliptic curve cryptosystems require the use of points P with prime order 
very close to the curve order, which usually implies the above conditions. 

By these conditions we get that 

V>(P) = AP ^ P, 

^'^-i(P) = X‘^-^P^P, 

^'^(P) = A'^P = P 



which implies that d\p— 1. 

Remember that ip{x) = . Since we are working over a subfield of character- 

istic 2, squaring is always a very efficient operation. In particular when a normal 
basis representation is used, it is just a cyclic shift of the binary representation 
of the field element. Thus '(piP) can be computed very efficiently. 

Similar to Section 3, we will use a parallel collision search and compute Zi, 
Ai and Bi as usual. We can now also compute the 2d different points on the 
curve ±ip^ {Zi) for 0 < j < d — 1. We would like to choose a “distinguished” 
or “canonical” representative from this set. We will first consider the d points 
tp^{Zi) and use the one whose x coordinate’s binary representation has smallest 
value when interpreted as an integer. We can then choose either that point or its 
negative depending on which has smaller y coordinate when interpreted as an 
integer. This point will now replace Zi. If we have chosen ±ip^{Zi) to replace Zi, 
we must then replace Ai with ±X^ Ai and also replace Bi with ±A^ Pi to maintain 
the relationship Zi = AiP+BiQ. The powers of A^ can be precomputed to obtain 
further efficiencies. 

By performing the above operation at every step of the parallel collision 
search, we will be reducing the size of our search space by a factor of 2d. Thus, 
the expected running time to compute the discrete logarithm will decrease by a 
factor of -\/2d. 
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The iteration function / used in the parallel collision search must be chosen 
carefully. In particular, notice that if the function is chosen to be a choice between 
just 2Z, Z+P and Z+Q (as in the basic parallel collision search algorithm), then 
in some situations trivial cycles are likely to occur. Notice that for i < j, Zj can 
be written as Zj = pi{X)Zi+p 2 {X)P+P 3 {X)Q where pi,P 2 andps are polynomials 
in A. Also notice that these polynomials will have small coefficients ii j — i is not 
too big. When using anomalous binary curves, the value A satisfies A^ + A + 2 or 
A^ — A + 2. In either case, A will be likely to be a root of the polynomials in the 
expression for Zj, and hence a trivial cycle will be encountered. Experimentation 
shows that the modified iteration function described in Section 3 reduces the 
occurrences of these trivial cycles sufficiently for practical purposes. 



4.1 Anomalous Binary Curves 

Now consider the situation created by using anomalous binary curves. If E(^a,b) 
(GF(2'")) is such a curve, then a,b G {0, 1}, so we are actually using subfield 
curves with e = 1 and d = m. 

These curves are particularly well suited to this attack because the size of 
the space searched is reduced by a factor of 2m, which reduces the expected 
running time by a factor of V2m. Thus the attacks on anomalous binary curves 
using this method are the most efficient among all subfield curves. 

As an example, consider the anomalous binary curve E(i i)(GF(2^®^)). This 
curve has been considered particularly attractive for implementing elliptic curve 
cryptosystems since its order is twice a prime close to 2^®^. Many standards 
recommend that elliptic curve cryptosystems use curves divisible by a prime of 
at least 160 bits to obtain an expected attack time of at least 2®*^ operations PQ. 

The conventional parallel collision search method for computing discrete log- 
arithms on this curve is expected to take approximately 2®^ operations. Using 
the improvements suggested above will reduce this expected running time by a 
factor of V2 • 163 to approximately 2^^ operations. This is below the required 
level of security imposed by the standards. Thus, this curve should not be used 
if a security level of 2®° is desired. 

5 Efficiency Considerations 

It has been shown that the number of group operations required to perform 
an elliptic curve logarithm can be reduced, but this is not much good if too 
much added computation is required in each step. In this section we show how 
to keep computation low. At each stage of the algorithm we know that the 
equation Zi = AiP + BiQ holds. We have at each stage that = ±A^ (Ai-|-c) 
(say) for some 0 < j < d — 1 and some multiplier c. If we represent Ai as 
Ai = (— 1)“‘ A’'*Wi, Ui G {0,1}, 0 < Vi < d — 1, 0 < Wi < n — 1, then we can 
compute Wi+i as Wi -I- (— 1)“‘ A“’'*c, Vi+i as Vi + j, and Ui is negated if necessary. 
The coefficient Bi can be tracked similarly. If there is a precomputed table of 
X~lc for each j = 0, . . . , d — 1 and each multiplier c, then the computation on 
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each step consists of additions or subtractions modulo n, additions modulo d 
and sign changes. This is much cheaper than an elliptic curve addition and is 
not a significant part of the algorithm run-time. 

We have implemented these ideas on the anomalous binary curve iH(o,i)(GF 
(241))^ The iteration function used 20 multipliers and used the look-ahead scheme 
described in Section 3. Over 15 trials, the experimental run-times were consistent 

with the expected run-time of • 

6 Other Attempts for Faster Attacks 

Another way that one might try to take advantage of the Frobenius endomor- 
phism is to use parallel collision search as usual, but to check whether any stored 
distinguished points are negatives of each other or can be mapped to each other 
with the Frobenius endomorphism. This is easiest when using a method for choos- 
ing distinguished points which leaves a point distinguished if the Frobenius map 
is applied. 

Unfortunately, this approach will not work unless the iteration function is 
carefully chosen so that all members of one equivalence class map to the same 
new equivalence class. The principle behind parallel collision search is that each 
distinguished point stands for the entire set of points in the trail leading to the 
distinguished point. A collision occurs because one trail runs into another trail 
and is lead to the common distinguished point. When a collision occurs and is 
detected, the two distinguished points are identical. The probability of encoun- 
tering two unequal distinguished points which have a Frobenius map and/or 
negation map is very low. 

Another way to think of this is that the iteration function acts as a random 
mapping and not all distinguished points are equally likely to appear. In fact, 
distinguished points tend to have radically different sized tree structures leading 
into them. The conditional probabilities are such that if a distinguished point 
occurs, it is very likely to have a large tree structure leading into it, making it 
a likely candidate to appear again. However, the distinguished points which are 
Frobenius and/or negation maps of the one which has occurred are not likely to 
have large tree structures. 

It should be noted that the methods presented in Section 4 may also apply 
to any elliptic curve that has an easily computed automorphism that has small 
order modulo the order of the point. For example, consider the curve 

— ax 

over GF{p), with point P = (xo,yo) of prime order n. This curve has complex 
multiplication by 'ZL[i\. Let ip be a solution to x^ = —1 (mod p). Then, i)i{P) — 
(—xo, ipyo) = KP where Xi is a solution to = —1 (mod n). Since 

^°{P) = P 
^\{P) = KP 
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i’UP) = -P 
^PUP) = -A.P 

are all distinct we can reduce the size of our search space by a factor of 4 to get 
a speed up of 2 over the general parallel collision search. 

Also, consider the curve 

y'^ =x^ + b 

over GF(p), with point P = {xo,yo) of prime order n. This curve has complex 
multiplication by 7L [w] where a; is a cube root of unity. Let Up be a solution to 
= 1 (modp). Then, ipui{P) = {t^pXo,yo) = ^uiP where is a solution to 
= 1 (mod n). Since 

±V>°(P) = ±P 
±^Pl{P) = ±A.P 
±^Pl(P) = ±XlP 

are all distinct we can reduce the size of our search space by a factor of 6 to get 
a speed up of \/6 over the general parallel collision search. 

7 Conclusion 

Subfield and anomalous binary curves have been attractive to cryptographers 
for quite some time because of the efficiencies they provide both in curve gen- 
eration and in the implementation of cryptographic algorithms. There have also 
been unsubstantiated warnings for quite some time that these curves may be 
more open to attack because of the greater structure that these curves have. 
The results of this paper show that this structure can in fact be used to ob- 
tain faster attacks. While the attack presented here still has a fully exponential 
running time, care should be exercised when choosing these curves regarding 
their expected security level. In certain circumstances these curves may still be 
attractive because of their efficiencies with respect to curves of similar security 
levels. 

These results highlight the fact that more research must be done on the 
cryptanalysis of elliptic curve cryptosystems before we can be fully confident of 
the security level different curves offer. Two open questions remain: 

— Can the ideas presented here be used, possibly in combination with other 
methods to reduce the attack time further? 

— Can similar ideas be applied to other classes of curves or to curves whose 
coefficients do not lie in the subfield? 
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Abstract. This paper describes three contributions for efficient imple- 
mentation of elliptic curve cryptosystems in GF{2^). The first is a new 
method for doubling an elliptic curve point, which is simpler to imple- 
ment than the fastest known method, due to Schroeppel, and which 
favors sparse elliptic curve coefficients. The second is a generalized and 
improved version of the Guajardo and Paar’s formulas for computing 
repeated doubling points. The third contribution consists of a new kind 
of projective coordinates that provides the fastest known arithmetic on 
elliptic curves. The algorithms resulting from this new formulation lead 
to a running time improvement for computing a scalar multiplication of 
about 17% over previous projective coordinate methods. 



1 Introduction 

Elliptic curves defined over finite fields of characteristic two have been proposed 
for Difhe-Hellman type cryptosystems B. The calculation of Q = mP, for P 
a point on the elliptic curve and m an integer, is the core operation of elliptic 
curve public- key cryptosystems. Therefore, reducing the number of field oper- 
ations required to perform the scalar multiplication mP is crucial for efficient 
implementation of these cryptosystems. 

In this paper we discuss efficient methods for implementing elliptic curve 
arithmetic. We present better results than those reported in our basic 

technique is to rewrite the elliptic operations (doubling and addition) with less 
costly field operations (inversions and multiplications), and replace general field 
multiplications by multiplications by fixed elliptic coefficients. 

The first method is a new formula for doubling a point, i.e., for calculating 
the sum of equal points. This method is simpler to implement than Schroeppel’s 
method ^ since it does not require a quadratic solver. If the elliptic curve 
coefficient b is sparse, i.e., with few f’s in its representation, thus making the 
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multiplication by the constant b more efficient than a general field multiplication, 
then our new formula should lead to an improvement of up to 12% compared 
to Schroeppel’s method Q. We also note that our formula can be applied to 
composite finite fields as well. 

In I, a new approach is introduced for accelerating the computation of 
repeated doubling points. This method can be viewed as computing consecutive 
doublings using fractional field arithmetic. We have generalized and improved 
the formulas presented in that paper. The new formulas can be used to speed- 
up variants of the sliding-window method. For field implementations where the 
cost-ratio of inversion to multiplication varies form 2.5 to 4 (typical values of 
practical software field implementations), we expect a speed-up of 7% to 22% in 
performing a scalar multiplication. 

In B, Schroeppel proposes an algorithm for computing repeated doubling 
points removing most of the general field multiplications, and favoring elliptic 
curves with sparse coefficients. Using his method, the computation of 2*P, i> 2 
requires i field inversions, i multiplications by a fixed constant, one general field 
multiplication, and a quadratic solver. Since inversion is the most expensive 
field operation, this method is suitable for finite fields where field inversion is 
relatively fast. If the cost-ratio of inversion to multiplication is less than 3, this 
algorithm may be faster than our repeated doubling algorithm. 

When field inversion is costly (e.g., for normal basis representation, the cost- 
ratio of inversion to multiplication is at least 7 ^|), projective coordinates offer 
an alternative method for efficiently implementing the elliptic curve arithmetic. 
Based on our doubling formula, we have developed a new kind of projective 
coordinates which should lead to an improvement of 38% over the traditional 
projective arithmetic coordinates Q and 17% on the recent projective coordi- 
nates presented in Q, for calculating a multiple of a point. 

The remainder of the paper is organized as follows. Section 2 presents a 
brief summary of elliptic curves defined over finite fields of characteristic two. In 
Section 3, we present our doubling point algorithm. Based on this method, we 
describe an algorithm for repeated doubling points in Section 4. In Section 5, we 
describe the new projective coordinates. An implementation of the doubling and 
adding projective algorithms is given in the appendix. 



2 Elliptic Curves over GF{2^) 

A non-supersingular elliptic curve E over GF(2”) is defined to be the set of 
solutions {x,y) G GF(2”) x GF(2") to the equation, 

+ xy = + ax^ + b , 

where a and b € GF(2"),5 ^ 0, together with the point at infinity denoted by 

O. 

It is well known that E forms a commutative finite group, with O as the 
group identity, under the addition operation known as the “tangent and chord 
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method” . Explicit rational formulas for the addition rule involve several arith- 
metic operations (adding, squaring, multiplication and inversion) in the under- 
lying finite field. In what follows, we will only be concerned with formulas for 
doubling a point P in affine coordinates; formulas for adding two different points 
in affine or projective coordinates can be found in | ' | . 

Let P = (a:i, yi) be a point of E. The doubling point formula Q to compute 
2P= {X 2 ,y 2 ) is given by 



X 2 — x\ + \ , 

Xi 

y2=x\ + (a;i +"^)-X2 + X2 ■ 



( 1 ) 



Note that the a:-coordinate of doubling point formula 2P depends only on the 
x-coordinate of P and the coefficient b, but doubling a point requires two general 
field multiplications, one multiplication by the constant b and one field inversion. 

Schroeppel | improved the doubling point formula saving the multiplication 
by the constant b. His improved doubling point formula is : 

(x2=M^ + M + a , 

< y2 = xj + M ■ X2 + X2 , (2) 

[ M = XI + fi . 

Observe that the x-coordinates of the previous doubling point formula lead to 
the quadratic equation for M : 

M^ + M + a = xl+-^ . (3) 

Xi 

If we assume that the cost of multiplying by a sparse fixed constant is comparable 
in speed to field addition, and that solving the previous quadratic equation is 
faster, then we obtain another method for doubling a point with an effective 
cost of one general multiplication and one field inversion. A description of this 
method, developed by Schroeppel, can be found in ^ pp. 370-371] and Q. 

In the next section, we introduce a new doubling point formula which requires 
also a general field multiplication, one field inversion, but does not depend on a 
quadratic solver. 



3 A New Doubling Point Formula 

Given an elliptic curve point P = (xi, yi), the coordinates of the doubling point 
2P = (x 2 , j/ 2 ) can be calculated by the following new doubling point formula: 
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To derive the above formula we transform the y-coordinate of the doubling point 
formula (2): 



y2 = xl + {Xi + —)-X2 + X2 = ^ + [ 



y\ + h + ax\ 



ax2 



Xi 

yl 



) ■ X2 



+ b ,xi + b 



(— !-2 — ) — — + ax2 + {yl + 6) • (1 H j) 



3.1 Performance Analysis 

We begin with the observation that our doubling formula eliminates the need 
for computing the field element M from formula (2), which requires either one 
general multiplication or a quadratic solver. The calculation of 2P requires one 
general field multiplication, two field multiplications by the fixed constant b, 
and one field multiplication by the constant a. This last multiplication can be 
avoided by choosing the coefficient a to be 0 or l^Thus, our formula favors 
elliptic curves with sparse coefficients, i.e., those having relatively few I’s in 
their representation. 

In order to compare the running time of our formula with Schroeppel’s 
method ^ for computing a scalar multiplication, we made the following as- 
sumptions: 

— Adding and squaring field elements is fast. 

— Multiplying a field element by a sparse constant is comparable to adding. 

— The cost of solving the quadratic equation (3) and determining the right 
solution is about half of that of a field multiplication (this is true for the 
finite field implementation given in but no efficient method is known for 
tower fields Q). 

The fastest methods for computing a scalar multiplication perform five point 
doublings for every point-addition, on average. Table 1 compares our formula, 
in performing a scalar multiplication, for different values of the cost-ratio r of 
inversion to multiplication. 

Therefore, for practical field implementations as those given in our 

formula should lead to a running time improvement of up to 12% in computing 
a scalar multiplication. However, for elliptic curves selected at random (where 
the coefficient b is not necessarily sparse), both our and Schroeppel’s method 
may not give a computational advantage. A better algorithm for computing 2®P 
is presented in the next section. 



4 Repeated Doubling Algorithm 

We present a method for computing repeated doublings, 2®P, i > 2, which is 
based on fractional field arithmetic and the doubling formula. The idea is to 

^ E is isomorphic to Ei: + xy = + ax^ + 6 , where Tr{a) — Tr{a), a = 0 or 7 

and Tr{'y) = 1 (if n is odd, we can take 7 = 1 ), see ^ p. 39]. 
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Table 1. The number of field multiplications for computing 2®P + Q. 



Cost-Ratio 


New Formula 
#Mult. 


Schroeppel || 
#Mult. 


Improv. 

% 


r = 2 


19 


21.5 


12 


r = 2.5 


22 


24.5 


10 


r = 3 


25 


27.5 


9 


r = 4 


31 


33.5 


7 



successively compute the elliptic points 2^ P = {xj,yj), j = 1, 2, . . . , i, as triples 

{vj^Wj^Sj) of field elements, where Xi = ^ and yi = ^- The exact formulation 

hi 

is given in 

Theorem 1. Let P = (a;, y) be a point on the elliptic curve E. Then the coor- 
dinates of the point 2®P = (xi, yi),i > 2, are given by 

= ( 5 ) 

Vi = -H ■ ( 6 ) 

where 



vk+i = vt + b5l , vq = x 

^fc + l 5 ^0 f 

Wfc+i = bS^. ■ (5fc+i + Vk+i ■ (aJfc+i + + bS'l) , loq = y, 0 < k < i. 

Proof. We will prove by induction on i that Xt = ^ and ?/i = ^ • This is easily 

hi dj 

true for i = 1. Now assume that the statement is true for i = n; we prove it for 
i = n + 1: 



^n+1 — 9 

Xr. 



bSt 




ly 



2 

n 



s 



2 

n 



J^n+l 

<^n+l 



similarly, for j/n+i we obtain: 



b / 2 ^ N 

?/n+l 2 " + aXn-^-1 + (y„ + O) • (1 H 



I „ ^n+1 

dn+1 




+ b)-{l + 



bSt 



) 



bS^ Vn+l + bS^) ■ Vn+l 

Sn-\-l Sn-\-l ^n+1 



<^n+l 
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The following algorithm, based on Theorem 1, implements repeated dou- 
blings in terms of the afhne coordinates of P = (x, y). 



Fig. 1. Algorithm 1: Repeated doubling points. 



Input: 


P = (x,y) G E i>2. 


Output: Q = 2*P. 


Set V 


^x^, D^V, W^y, T^b. 


for k = 


- 1 to i—1 do 


Set 


V ^V^ + T. 


Set 


W ^ D-T+V -{aD + W^ + T). 


if k 


^ i — 1 then 




D^D^, T^bD^, D^D-V. 


fi 




od 




Set D 


^D-V. 


Set M 


-{V^ + W). 


Set X « 


- p-1 • v\ 


Set Xi 


^ + M + a, yi ^ x^ + M ■ Xi + Xi . 


return 


(.Q — Vi')') • 



Note that the correctness of this algorithm follows directly from the proof of 
Theorem 1 and formula (2). 

Corollary 1 Assume that P is an elliptic point of order larger than 2L Then 
Algorithm 1 performs 3z — 1 general field multiplications, i—1 multiplications by 
the fixed constant b, and 5z — 4 field squarings. 



4.1 Complexity Comparison 

Since Algorithm 1 cuts down the number of field inversions at the expense of 
more field multiplications, the computational advantage of Algorithm 1 over re- 
peated doubling (using the standard point doubling formula (2)) depends on r, 
the cost-ratio of inversion to multiplication. Assuming that adding and squaring 
is fast, we conclude, from Corollary 1, that Algorithm 1 outperforms the com- 
putation of five consecutive doublings when r > 2. Table 2 shows the number of 
field multiplications needed for computing 2^P + Q for several methods and for 
different values of r. Note that the standard algorithm and Guajardo and Paar’s 
formulas do not use the elliptic coefficient b, whereas Algorithm 1 does. 
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Table 2. Comparison of Algorithm 1 with other algorithms. 



Ratio 

r 


Algorithml 


Schroeppel | 


G.P. ■ 


Standard (2) 


b sparse 


b random 


b sparse 


b random 


b random 


b random 


2.5 


21 


25 


18.5 


22.5 


27 


27 


3 


22 


26 


21.5 


25.5 


28 


30 


3.5 


23 


27 


24.5 


28.5 


29 


33 


4 


24 


28 


27.5 


31.5 


30 


36 



Algorithm 1 obtains its best performance for field implementations when r 
is at least three. If the elliptic curve is selected at random, then we expect Algo- 
rithm 1 to be up to 22% faster than the standard algorithm. For field implemen- 
tations where r < 3, (for example ^Q), Schroeppel’s method Q outperforms 
Algorithm 1. 



5 A New Kind of Projective Coordinates 

When field inversion in GF(2"') is relatively expensive, then it may be of com- 
putational advantage to use fractional field arithmetic to perform elliptic curve 
additions, as well as, doublings. This is done with the use of projective coordi- 
nates. 



5.1 Basic Facts 

A projective plane is defined to be the set of equivalence classes of triples 
(A, y, Z), not all zero, where (Ai, Yi, Zi) and (A 2 , 1^2, Z 2 ) are said to be equiv- 
alent if there exists A G GF(2”), A 0 such that Ai = AA 2 , Yi = A^y 2 and 
Z\ — XZ 2 - Each equivalence class is called a projective point. Note that if a 
projective point P = (A, Y, Z) has nonzero Z, then P can be represented by 
the projective point (i, ?/, 1), where x = XjZ and y — YjZ'^. Therefore, the 
projective plane can be identified with all points (x, y) of the ordinary (affine) 
plane plus the points for which Z — 0. 

Any equation f{x, y) = 0 of a curve in the affine plane corresponds to an 
equation F{X, Y, Z) = 0, where F is obtained by replacing x = XjZ, y = Y/Z^, 
and multiplying by a power of Z to clear the denominators. In particular, the 
projective equation of the affine equation y'^ + xy = x^ + ax^ -1- 5 is given by 

Y^ + XYZ = X^Z+aX^Z^ + bZ"^ . 

If Z = 0 in this equation, then Y^ = 0, i.e., Y = 0. Therefore, (1,0,0) is the 
only projective point that satisfies the equation for which Z — 0. This point is 
called the point at infinity (denoted O). 

The resulting projective elliptic equation is 

E = {(i, y, z) G P^, y^ -|- xyz = x^z + ax^z^ -b bz'^} . 
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To convert an affine point {x,y) to a projective point, one sets X = x, Y = 
y, Z =\. Similarly, to convert a projective point (X,V,Z) to an affine point, 
we compute x = XjZ^ y = YjZ'^. The projective coordinates of the point 
-P(X, y, Z) are given by - P(X, F, Z) = (X, XZ + Y,Z). The algorithms for 
adding two projective points are given below. 

5.2 Projective Elliptic Arithmetic 

In this section we present new formulas for adding elliptic curve points in pro- 
jective coordinates. 

Projective Elliptic Doubling 

The projective form of the doubling formula is 

2(Ai,Fi,Zi) = (A2,F2,^2) , 

where 

/7 /72 v"2 

Z 2 — 2^1 • Ai , 

X2 = Xf + b-Zf , 

Y 2 = bZf -Z 2 +X 2 - (aZ 2 + Fi" + bZf) . 

Projective Elliptic Addition 
The projective form of the adding formula is 

(Ao, Fo, Zo) + (Ai, Fi, Zi) = (A 2 , F 2 , Z 2 ) , 

where 

Ao = Fi • ^2 , D = Bo + , H = C-F , 

Ai = Yo-Zf , E^Zo-Zi , X2 = C^ + H + G , 

Bo^ Xi- Zo , F = D-E , I ^ ■ Bo- E + X 2 , 

B, = Xo-Z, , Z2 = F^ , J = DEAo + X2 , 

C = Ao + Ai, G = D^-{F + aE^) , Y 2 = H ■ I + Z 2 ■ J . 

These formulas can be improved for the special case Zi — 1: 

(Ao, Fo, Zo) + (Ai, Fi, 1) = (A 2 , F 2 , Z 2 ), 

A=Yi- Z§ + Yo , E = A- G , 

B = Xi-Zo + Xo , A 2 = A2 + P + P , 

G=Zo-B , F = X2 + Xi-Z 2 , 

D = B^-{G + aZl) , G = A 2 + Fi • P 2 , 

^2 = G2 , Y2 = E-F + Z2-G . 



where 
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5.3 Performance Analysis 

The new projective doubling algorithm requires three general field multiplica- 
tions, two multiplications by a fixed constant, and five squarings. Since doubling 
a point takes one general field multiplication less than the previous projective 
doubling algorithm given in Q, we obtain an improvement of about 20% for dou- 
bling a point, in general. For sparse coefficients b, we may obtain an improvement 
of up to a 25%. 

The new projective adding algorithm requires 13 general multiplications, 
one multiplication by a fixed constant and six squarings. If a = 0 (or a = 1) 
and Zi = 1, then only nine general field multiplications and four squarings 
are required. Thus, we obtain one field multiplication less than the previous 
projective addition algorithm presented in Q. The number of field operations 
required to perform an elliptic addition for various kinds of projective coordinates 
is listed in Table 3. 

Now we can estimate the improvement of a scalar multiplication using the 
new projective coordinates. We will consider only the case a = 0 (or a = 1) and 
Zi = 1, since for this situation we obtain the best improvement. The number of 
field operations for computing 2^P + Q is given in Table 3. Using these values 
we can conclude that the computation of a scalar multiplication, based on the 
new projective coordinates, is on average 17% and 38% faster than the previous 
projective coordinates 



Table 3. The number of field operations for 2^P + Q {a — 0 or 1, Zi = 1) 



Projective 

coordinates 


Doubling 


Adding 


Cost of 2^P + Q 


#Mult. 


#Sqr. 


#Mult. 


#Sqr. 


#Mult. 


#Sqr. 


{xlz,ylz^) 


4 


5 


9 


4 


29 


29 


{xjz"‘, x/z^) 


5 


5 


10 


4 


35 


29 


{xlz,ylz) 


7 


5 


12 


1 


47 


26 
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6 Conclusions 

We have presented improved methods for faster implementation of the arithmetic 
of an elliptic curve defined over GF(2”). Our methods are easy to implement 
and can be applied to all elliptic curves defined over fields of characteristic two, 
independently of the specific field representation. They favor sparse elliptic co- 
efficients but also perform well for elliptic curves selected at random. In general, 
they should lead to an improvement of up to 20% in the computation of scalar 
multiplication. 
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8 Appendix 

Algorithm 2: Projective Elliptic Doubling Algorithm 

Input: the finite field GF(2'"); the field elements a and c = = b) 

defining a curve E over GF(2'"); projective coordinates (Ai, Yi, Zi) for a point 
Pi on E. 

Output: projective coordinates (A 2 , ^ 2 , ^ 2 ) for the point P 2 = 2Pi. 



1. 


Ti^Xi 


2. 


T 2 


3. 


Ta ^ Zi 


4. 


T 4 < — C 



5. if Ti = 0 or T3 = 0 then 

output (1,0,0) and stop. 

6. Ta ^ Ti 

7. Ti^TaX Ti 

8. Ti ^ Ti 

9. Ti ^ Ti 

10 . Ta^TiXTa = Z 2 

11. Ti ^ Ti 

12 . Ti^Ti+Ti = X 2 

13. T2 ^ Ti 

14. if a 7^ 0 then 

Ta ^ a 
Ta^ Tax Ta 
P2 ^ Ts + P 2 

15. T2^Ti+ T 2 

16. T 2 — Ti X T 2 

17. Ti^TaX Ti 

18. T2^Ti+T2 = Y 2 

19. A 2 ^ Ti 

20. P2 ^ T 2 

21. Z 2 ^ Ta 



This algorithm requires 3 general field multiplications, 5 field squarings and 5 
temporary variables. If also a = 0, then only 4 temporary variables are required. 
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Algorithm 3: Projective Elliptic Adding Algorithm 

Input: the finite field GF(2"*); the field elements a and b defining a curve E 
over GF(2"*); projective coordinates {Xq,Yo, Zq) and (Ai,Yi,l) for points Pq 
and Pi on E. 

Output: projective coordinates (A 2 , F 2 , Z 2 ) for the point P 2 = Po + Pi, unless 
Pq — Pi- In this case, the triple (0,0,0) is returned. (The triple (0,0,0) is not 
a valid projective point on the curve, but rather a marker indicating that the 
Doubling Algorithm should be used, see Q.) 



1. 






2. 


T2*-Yo 




3 . 


T3 ^ Zq 




4 . 


24^ Ai 




5 . 


P5 ^Ti 




6. 


Te ^ T4 X T3 




7 . 


Ti ^ Te + Ti 


= B 


8. 


n ^ Ti 




9 . 


if a 7^ 0 the 
Tj < — a 

TV ^ Te X TV 




10. 


Te ^ P5 X Te 




11. 


TV ^ Te + T2 


= A 


12. 


if TV = 0 then 






if T2 = 0 then output (0, 0, 0) and stop, 
else output (1,0,0) and stop. 


13 . 


Te ^ TV X Ti 


= C 


14 . 


TV Tf 




15 . 


if a 7^ 0 then 

TV ^ Te + TV 






Pi <— TV X TV 


= D 




else Pi — Pe X Pi 


= D 


16 . 


P3 ^ Ti 


= Z2 


17 . 


Pe ^ P2 X Pe 


= E 


18 . 


Pi ^ Pe + Pi 




19 . 


P2 ^ P| 




20. 


Pi ^ P2 + Pi 


= X2 


21. 


P4 ^ P3 X P4 




22. 


P5 ^ P3 X P5 




23 . 


P4 ^ Pi + P4 


= F 


24 . 


Ps ^ Pi + Ps 


= G 


25 . 


P4 ^ Pe X P4 




26 . 


P5 ^ P3 X P5 




27 . 


P2 ^ P4 + P5 


= Y2 


28 . 


A2 ^Pi 




29 . 


Y2^T2 




30 . 


Z2 ^ Ti 





This algorithm requires 9 general field multiplications, 4 field squarings and 7 
temporary variables. If also a = 0, then only 6 temporary variables are required. 
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Abstract. At SAC ’97, Itoh, Okamoto and Mambo presented a fast pub- 
lic key cryptosystem. After analyzing several attacks including lattice- 
reduction attacks, they claimed that its security was high, although the 
cryptosystem had some resemblances with the former knapsack cryp- 
tosystems, since decryption could be viewed as a multiplicative knapsack 
problem. In this paper, we show how to recover the private key from a 
fraction of the public key in less than 10 minutes for the suggested choice 
of parameters. The attack is based on a systematic use of the notion of 
the orthogonal lattice which we introduced as a cryptographic tool at 
Crypto ’97. This notion allows us to attack the linearity hidden in the 
scheme. 



1 Introduction 

Two decades after the discovery of public key cryptography, only a few asym- 
metric encryption schemes exist, and the most practical public key schemes are 
still very slow compared to conventional secret key schemes. Extensive research 
has been conducted on public- key cryptography based on the knapsack problem. 
Knapsack-like cryptosystems are quite interesting: they are easy to implement, 
can attain very high encrypt /decrypt rates, and do not require expensive op- 
erations. Unfortunately, all the cryptosystems based on the additive knapsack 
problem have been broken, mainly by means of lattice-reduction techniques. 
Linearity is probably the biggest weakness of these schemes. 

To overcome this problem, multiplicative knapsacks have been proposed as an 
alternative. The idea of multiplicative knapsack is roughly 20 years old and was 
first proposed in the open literature by Merkle and Heilman Q in their original 
paper. Merkle-Hellman’s knapsack was (partially) cryptanalyzed by Odlyzko Q, 
partly because only decryption was actually multiplicative, while encryption was 
additive. 

Recently, two public-key cryptosystems based on the multiplicative knapsack 
problem have been proposed: the Naccache-Stern cryptosystem Q presented 
at Eurocrypt ’97, and the Itoh-Okamoto-Mambo cryptosystem Q presented 
at SAC ’97. In the latter one, both encryption and decryption were relatively 
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fast. After analyzing several attacks including lattice-reduction attacks, Itoti, 
Okamoto and Mambo claimed that the security of their cryptosystem was high. 

We present a very effective attack against this cryptosystem. In practice, one 
can recover the private key from the public key in less than 10 minutes for the 
suggested choice of parameters. The attack is based on a systematic use of the 
notion of the orthogonal lattice which we introduced as a cryptographic tool at 
Crypto ’97 Q. As in ' ~ |, this technique enables us to attack the linearity 



hidden in the keys generation process. 



2 Description of the Cryptosystem 

The message space is Zm, the ring of integers modulo an integer M. Let N be 
a product of two large primes P and Q. Let I and n be integers such that I < n. 
Select positive integers gi, . . . , less than and distinct primes q [, . . . , 9^ 
such that: 

— For all i, q[ divides qi. 

— For all i ^ j, 5' does not divide qi/q'i- 

Choose an integer t in Zw coprime with P, and integers ki, ... ,k„ in Zw satis- 
fying the following congruence: 

ki = tqi (modP). 

Finally, select random elements ei, . . . , e„ in Zm- 

The public key consists of: the (e^, ki)’s, M, N , n and 1. 

The secret key consists of: P, Q, t, the qi’s and the q'’s. 

2.1 Encryption 

Let s £ Zm be the plaintext. Alice chooses I integers A, . . . , q (not necessarily 
distinct) in {1, . . .n}. The ciphertext is (m, r) G Zm x Zw defined by: 

m = s + 6 i^ + 6 i 2 + ■ ■ ■ + 6 ii (mod M) 
r = ki^ki.^ . . .kii (mod N) 



2.2 Decryption 

Let {m,r) be the ciphertext. First, Bob computes r' = (mod P), We 

have: 

r' = qnqi 2 ---Qii (mod P). 

Since each q\ is strictly less than P, we actually have: 

r' = 9ii9i2 ■■■Qii- 

Eventually, Bob recovers s as follows: 

1. Let i = 1. 

2. If q[ divides r', let m := m — et (modM) and r' = r' /qi. 

3. If r' = 1, Bob gets m as a, plaintext. Otherwise, increment i and start again 
at Step 2. 
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2.3 Parameters 

In their paper Itoh, Okamoto and Mambo analyzed several possible attacks, 
including a lattice-reduction attack. They concluded that their cryptosystem was 
secure for the following choice of parameters: 

— N = 1024 bits, P = 768 bits and Q = 256 bits. 

— n = 180 and I = 17. 

— qmax = 2^^® (6 bytes) and = 2^2 (4 bytes). 

In this example, the public key takes 45 Kbytes and the private key takes 1.8 
Kbytes. Compared to RSA-1024 with small exponent, encryption speed is simi- 
lar, but decryption is about 50 times faster. 

3 The Orthogonal Lattice 

We recall a few useful facts about the notion of an orthogonal lattice, which 
was introduced in | as a cryptographic tool. Let L be a lattice in Z"" where 
n is any integer. The orthogonal lattice L-^ is defined as the set of elements 
in Z" which are orthogonal to all the lattice points of L, with respect to the 
usual dot product. We define the lattice L = which contains L and whose 

determinant divides the one of L. The results of Q which are of interest to us 
are the following two theorems: 

Theorem 1. If L is a lattice in TP , then dim(T) -b dim(T-*-) = n and: 

det(T'*‘) = det(Z). 

Thus, det(T2-) divides det(T). This implies that if T is a low-dimensional lattice 
in Z", then a reduced basis of L-^ will consist of very short vectors compared to 
a reduced basis of T. In practice, most of the vectors of any reduced basis of 
are quite short, with norm around det(Z)^/^"“'^™^^ 

Theorem 2. There exists an algorithm which, given as input a basis of a lattice 
L in TP , outputs an LLL-reduced basis of the orthogonal lattice L-^ , and whose 
running time is polynomial with respect to n, d and the size of the basis elements. 

In practice, one obtains a simple and very effective algorithm (which consists 
of a single lattice reduction, described in ^) to compute a reduced basis of the 
orthogonal lattice, thanks to the celebrated LLL algorithm Q. This means that, 
given a low-dimensional T in Z", one can easily compute many short and linearly 
independent vectors in . 

4 Attacking the Scheme by Orthogonal Lattices 

Let m be an integer less than n. Define the following vectors in Z"*: 



k= {ki,k2,...,km) 
q = (91,92, ■■■Pm) 
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Note that an attacker knows k, but not q. By construction of the keys, we have 
the following congruence: 

k = tq (modP). 

This leads to a simple remark: 

Lemma 3 . Let u G Z'". If uTk then uTq or ||u|| > P/||q|| . 

Proof. We have: tq.u = 0 (mod P). Therefore q.u = 0 (mod P), and the result 
follows by Cauchy-Schwarz. □ 

This remark is interesting because ||q|| is much smaller than P. Indeed, since 
each Qi < P^^\ we have: 

||q|| < 

Therefore, if u £ Z"* is orthogonal to k then it is also orthogonal to q or satisfies 

Hull > ( 1 ) 

which implies that u is quite long. 

Furthermore, from the previous section, one can expect to find many vectors 
orthogonal to k, with norm around 

This quantity is much smaller than the right quantity of when m is large 
enough, so that we make the following assumption: 

Assumption 4 . Let (bi, b2, . . . , b^-i) be a reduced basis o/k-*-. Then the first 
m — 2 vectors bi, b2, . . . , hm-2 are orthogonal to q. 

Actually, one can prove that the first vector of an LLL-reduced basis satisfies 
the assumption, but this is not enough. 

Now assume that the hypothesis holds. Then q belongs to the 2-dimensional 
lattice L = (bi, . . . , hm-2)'^- One expects the vectors bi, . . . , hm-2 to have norm 
around ||k|| Therefore, the determinant of L should be around 

||]^||(m- 2 )/(m-l) _ IIJ^II^ 

But q belongs to L and its norm is much smaller than ||k||^/^. This leads to a 
more general assumption which is as follows: 

Assumption 5 . Let (bi, b2, . . . , bm_i) be a reduced basis o/k^. Then q js a 
shortest vector of the 2-dimensional lattice (bi, b2, . . . , hm-2)'^- 

If this hypothesis holds, one can use the Gaussian algorithm for lattice reduc- 
tion (which has worst-case polynomial time and average-case constant time) to 
recover ±q. 
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Next, we easily recover the secret factorization P x Q using the so-called 
differential attack described in []]. More precisely, there exist integers pi, . . .,pn 
such that: 

ki=piP + tqi (mod N) . 

Therefore, we have for all i ^ j'- 

Qjki - qikj = {piqj - Pjqi)P (mod N). 

It is likely that gcd{qjki — qikj,N) is equal to P. And if it is not, we can try 
again with a different (z, j). 

To sum up, the attack is the following: 

1. Select an integer m <n. 

2. Compute a reduced basis (bi, . . . , hm-i) of the lattice k^. 

3. Compute a reduced basis (ai, a 2 ) of the lattice (bi, . . . , hm- 2 )'^- 

4. Compute a shortest vector s of the previous lattice. 

5. Select integers i ^ j in {1, . . . , n} and denote the coordinates of s by s^. 

6. If gcd{sjki — Sikj, N) is not a proper factor of N, restart at previous step. 

In practice, we perform Steps 3 and 4 by a single LLL-reduction and take ai as 
s. Only Steps 2 and 3 take a little time. Note that we do not need to compute 
a complete reduced basis in Step 2 since the last vector is useless. 

Once q and the secret factorization of N are found, it is not a problem to 
recover the rest of the secret key: 

— t modulo P is given by ki = tqi (modP). 

— The q'i’s (or something equivalent) are revealed by the factors of the qi’s. 

5 Experiments 

We implemented the attack using the NTL package Q which includes efficient 
lattice-reduction algorithms. We used the LLL ffoating point version with ex- 
tended exponent to compute orthogonal lattices, since the entries of k were too 
large (about the size of N) for the usual ffoating point version. 

In practice, the attack reveals the secret factorization as soon as m > 4 for 
the suggested choice of parameters. When m < 20, the total computation time 
is less than 10 minutes on a UltraSparc-I clocked at 167 MHz. 

6 Conclusion 

We showed that the cryptosystem presented by Itoh, Okamoto and Mambo at 
SAC ’97 is not secure. The core of our attack is the notion of the orthogonal 
lattice which we introduced at Crypto ’97, in order to cryptanalyze a knapsack- 
like cryptosystem proposed by Qu and Vanstone. The attack is very similar 
to the attack we devised against the so-called Merkle-Hellman transformations. 
This is because the congruence k = tq (modP), which is used in the keys 
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generation process, looks like a Merkle-Hellman transformation: in a Merkle- 

Hellman equation, we have an equality instead of a congruence. 

We suggest that the design of multiplicative knapsack cryptosystems should 

avoid any kind of linearity. But this might be at the expense of efficiency. 
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Abstract. Ajtai recently found a random class of lattices of integer 
points for which he could prove the following worst-case/average-case 
equivalence result: If there is a probabilistic polynomial time algorithm 
which finds a short vector in a random lattice from the class, then there 
is also a probabilistic polynomial time algorithm which solves several 
problems related to the shortest lattice vector problem (SVP) in any 
n-dimensional lattice. Ajtai and Dwork then designed a public-key cryp- 
tosystem which is provably secure unless the worst case of a version of the 
SVP can be solved in probabilistic polynomial time. However, their cryp- 
tosystem suffers from massive data expansion because it encrypts data 
bit-by-bit. Here we present a public-key cryptosystem based on similar 
ideas, but with much less data expansion. 

Keywords: Public-key cryptosystem, lattice, cryptographic security. 



1 Introduction 

Since the origin of the idea of public-key cryptography, there have been many 
public-key techniques described in the literature. The security of essentially all of 
these depends on certain widely believed but unproven mathematical hypothe- 
ses. For example, the well-known RSA public- key cryptosystem relies on the 
hypothesis that it is difficult to factor a large integer n which is known to be 
a product of two large primes. This hypothesis has been extensively studied, 
but there is still no proof that for a typical such n, the prime factors cannot 
be found in less than k steps, where fc is a very large number. From a compu- 
tational complexity point of view, we generate a specific instance of a problem 
in NP (together with a solution, which is kept secret) and we rely on the belief 
that the problem is difficult to solve. 

Apart from the lack of proof that any of these problems is really hard, i.e., 
there exists no efficient algorithm that will solve the problem in all cases, there 
is another serious issue. The mathematical hypothesis that these problems are 
difficult to solve really means difficult to solve in the worst case, but the security 

* Research supported in part by NSF grant CCR-9634665 and an Alfred P. Sloan 
Fellowship. 
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of the cryptographic algorithms depends more on the difficulty of the average 
case. For example, even if one day factoring is proved to be unsolvable in prob- 
abilistic polynomial time, to the users of the RSA system, there is no guarantee 
that the key they are actually using is hard to factor. To use these protocols, one 
must be able to generate specific instances of the problem which should be hard 
to solve. But typically there is no way to just generate known hard instances. 
One way to do this is to generate random instances of the problem, and hope 
that such instances are as hard on the average as in the worst case. However this 
property is known to be not true for a number of NP-hard problems. 

Recently Ajtai | proved that certain lattice problems related to SVP have 
essentially the same average case and worst case complexity, and both are con- 
jectured to be extremely hard. This development raises the possibility of public- 
key cryptosystems which will have a new level of security. Already Ajtai and 
Dwork Q have proposed a public-key cryptosystem which has a provable worst- 
case/ average-case equivalence. Specifically, the Ajtai-Dwork cryptosystem is se- 
cure unless the worst case of a certain lattice problem can be solved in proba- 
bilistic polynomial time. 

Goldreich, Goldwasser and Halevi have also given a public-key cryptosys- 
tem which depends on similar lattice problems related to SVP as in Unlike 
the work of 0, however, their method uses a trapdoor one-way function and 
also lacks a proof of worst-case/ average-case equivalence. 

The cryptosystems of Q are unfortunately far from being practical. All of 
them encrypt messages bit-by-bit and involve massive data expansion: the en- 
cryption will be at least a hundred times as long as the message. (Note: In a 
private communication, Ajtai has informed us that this data expansion problem 
is being addressed by the authors of Q as well.) In this paper we propose a 
public- key cryptosystem, based on the ideas of Q and Q, which has much less 
data expansion. Messages are encrypted in blocks instead of bit-by-bit. We offer 
some statistical analysis of our cryptosystem. We also analyze several attacks on 
the system and show that the system is secure against these attacks. Whether 
there is a provable worst-case/ average-case equivalence for this system is open. 

2 Lattice problems with worst-case/average-case 
equivalence 

Here we briefly define the terms for lattice problems, and describe the results of 
Ajtai y and some improvements. 

Notation. R is the field of real numbers, Z is the ring of integers, R" is the 
space of n-dimensional real vectors a = (ai, . . . , a„) with the usual dot product 
a ■ b and Euclidean norm or length ||a|| = (a • Z" is the set of vectors in 

R" with integer coordinates, Z+ is the positive integers and Z^ is the ring of 
integers modulo q. 

Definition. If A = {oi, . . . , a„} is a set of linearly independent vectors in R"", 
then we say that the set of vectors 

{J27=i ■ fci, ■ ■ ■ , fcn G Z} 
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is a lattice in R". We will denote the lattice by L{A) or L{a\, . . . , a„). We call 
A a basis of the lattice. We say that a set in R" is an n-dimensional lattice 
if there is a basis F of n linearly independent vectors such that L = L{V). If 
A = {ai, . . . , a„} is a set of vectors in a lattice L, then we define the length of 
the set A by max^^^ ||ai||. Xi{L) = miuo^y^L ||t^||- 

A fundamental theorem of Minkowski is the following: 

Theorem 1 (Minkowski). There is a universal constant 7 , such that for any 
lattice L of dimension n, 3v G L, v ^ 0, such that 

\\v\\<-fV^det{Ly/^. 

The determinant det(L) of a lattice is the volume of the n-dimensional fun- 
damental parallelepiped, and the absolute constant 7 is known as Hermite’s 
constant. 

Minkowski’s theorem is a pure existence type theorem; it offers no clue as to 
how to find a short or shortest non-zero vector in a high dimensional lattice. To 
find the shortest non-zero vector in an n-dimensional lattice, given in terms of 
a basis, is known as the Shortest Vector Problem (SVP). There are no known 
efficient algorithms for finding the shortest non-zero vector in the lattice. Nor are 
there efficient algorithms to find an approximate short non-zero vector, or just 
to approximate its length, within any fixed polynomial factor in its dimension 
n. This is still true even if the shortest non-zero vector v is unique in the sense 
that any other vector in the lattice whose length is at most n°||ii|| is parallel to 
V, where c is an absolute constant. In this case we say that v is unique up to a 
polynomial factor. 

The best algorithm to date for finding a short vector in an arbitrary lattice in 
R" is the algorithm of A.K. Lenstra, H.W. Lenstra and L. Lovasz Q. This 
algorithm finds in deterministic polynomial time a vector which differs from the 
shortest one by at most a factor C.P. Schnorr Q proved that the factor 

can be replaced by (1 -I- e)" for any fixed e > 0. However Schnorr’s algorithm has 
a running time with 1 /e in the exponent. 

Regarding computational complexity, Ajtai Q proved that it is NP-hard to 
find the shortest lattice vector in Euclidean norm, as well as approximating the 
shortest vector length up to a factor of 1 + In a forthcoming paper Q, 
Cai and Nerurkar improve the NP-hardness result of Ajtai Q to show that the 
problem of approximating the shortest vector length up to a factor of 1 -f 
for any £ > 0, is also NP-hard. This improvement also works for all /p-norms, 
for 1 < p < 00 . Prior to that, it was known that the shortest lattice vector 
problem is NP-hard for the / 00 -norm, and the nearest lattice vector problem 
is NP-hard under all /p-norms, p > 1 ^3^3- Even finding an approximate 
solution to within any constant factor for the nearest vector problem for any 
/p-norm is NP-hard Q. On the other hand, Lagarias, Lenstra and Schnorr Q 
showed that the approximation problem (in / 2 -norm) within a factor of 0{n) 
cannot be NP-hard, unless NP = coNP. Goldreich and Goldwasser showed that 
approximating the shortest lattice vector within a factor of 0{^Jn/ log n) is 
not NP-hard assuming the polynomial time hierarchy does not collapse Q. Gai 



222 



Jin-Yi Cai and Thomas W. Cusick 



showed that finding an n^/^-unique shortest lattice vector is not NP-hard unless 
the polynomial time hierarchy collapses 

What is most striking is a recent result of Ajtai [[] establishing the first 
explicit connection between the worst-case and the average-case complexity of 
the problem of finding the shortest lattice vector or approximating its length. 
The connection factor in the Ajtai connection has been improved in [^. Ajtai 
defined a class of lattices in Z"* so that if there is a probabilistic polynomial time 
algorithm which finds a short vector in a random lattice from the class with a 
probability of at least then there is also a probabilistic polynomial time 

algorithm which solves the following three lattice problems in every lattice in Z" 
with a probability exponentially close to 1: 

(PI) Find the length of a shortest non-zero vector in an n-dimensional lattice, 
up to a polynomial factor. 

(P2) Find the shortest non-zero vector in an n-dimensional lattice where the 
shortest vector is unique up to a polynomial factor. 

(P3) Find a basis in an n-dimensional lattice whose length is the smallest pos- 
sible, up to a polynomial factor. 

The lattices in the random class are defined modulo q (g is an integer depend- 
ing only on n, as described below), that is, if two integer vectors are congruent 
modulo q then either both of them or neither of them belong to the lattice. More 
precisely, if n = {m, . . . , Um} is a given set of vectors in Z” then the lattice A{v) 
is the set of all integer vectors {hi, .. . ,hm) so that 

I]™ 1 hiUi = 0 (mod g). 

For a fixed n, m and q, the probability distribution over the random class is 
defined by uniformly choosing a sequence of integer vectors {u \, . . . , Um)- 
For a given n, the parameters m and q are defined by m = [cin] and q = 
where Ci and C 2 are suitable constants. 

The problem of finding a short vector in a lattice from the random class 
is a Diophantine problem. Questions of this type date back to Dirichlet’s 1842 
theorem on simultaneous Diophantine approximation. From this point of view 
the problem can be stated in the following way, which does not involve any 
explicit mention of lattices, as pointed out in 

(Al) Given n,m = [cin], q — [n‘^^] and an n by m matrix M with entries in Z^, 
find a non-zero vector x so that Mx = 0 (mod q) and ||a;|| < n. 

Minkowski’s theorem guarantees the existence of such short vectors x. Of 
course if the condition on ||a:|| is removed, then the linear system Mx = 0 
(mod q) can be solved in polynomial time. 

The theorem in | reduces the worst-case complexity of each of the problems 
(PI), (P2), (P3) to the average case complexity of (Al). Currently the best 
bounds that can be achieved are stated below Q, Q: 

Theorem 2. Q For any constant e > 0, if there exists a probabilistic poly- 
nomial time algorithm A such that, for a given random lattice A{i>), where 
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V = {ui,... ,Um) S is uniformly chosen, q = 0{n^) and m = 0[n), 

A will find a vector of the lattice A{i/) of length < n with probability , then, 
there also exists a probabilistic polynomial time algorithm B which for any given 
lattice L = L{a \, . . . , a„) by a basis ai, . . . , a„ G Z"", outputs another basis for 
L, bi, . . . ,bn, so that. 



max||6,|| < 0(n3-5+^) 

i—1 



min max 116' 11. 

all bases for L ®=i 



Theorem 3. Q Under the same hypothesis, there exists a probabilistic polyno- 
mial time algorithm C which for any given lattice L — L(a \, . . . , a„) by a basis 
will 

— compute an estimate of Ai = XfiL) up to a factor i.e., compute a 

numerical estimate Ai, such that 



Ai 



< Ai < Ai; 



— find the unique shortest vector if it is an -unique shortest vector. 



3 A new cryptosystem 

Here we present the design of a new cryptosystem, which is based on the difficulty 
of finding or approximating SVP, even though no specific lattices are defined. 
The secret key in the new system is a vector u chosen with uniform distribution 
from the unit sphere 5 "“^ = {rr | ||a;|| = 1}, and a random permutation a on 
m-b 1 letters. By allowing an exponentially small round-off error, we may assume 
that the coordinates of u are rational numbers whose denominators are bounded 
by some very large integer, exponential in n. Let m = [cn] for a suitable absolute 
constant c < 1. For definiteness set c = 1/2. Let Hi = {v \ v -u = i\ denote the 
hyperplanes perpendicular to u. The public key in the new system is a parameter 
6 > 0 and a set ■ j i’(T(m)} of rational vectors, where each Vj is in one of 

the hyperplanes Hi for some f G Z+, say vyu= Nj G Z+. We choose a sequence 
of numbers Nj so that it is superincreasing, that is 

No > b and W > Xlj=o Ati-\- b for each f = 1, 2, . . . , m. 

Binary plaintext is encrypted in blocks of m -b 1 bits. If P = {6q, . . . , Sm) is 
a plaintext block (6^ = 0 or 1), then P is encrypted as a random perturbation of 
More precisely, the sender picks a uniformly chosen random vector 
r with ||r|| < 6/2. Then the ciphertext is 

m 

+r. 
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Decryption is accomplished by using the secret key u to compute the following 
inner product 



Since the Ni are superincreasing, we can use the greedy algorithm to ef- 
ficiently recover the from S, and then use the secret a to recover 5i. 

More precisely, if < 5 ^- 1 ( 771 ) = li then S > Nm — b/2, and if Scr-i(m) = 0i then 
S < N 0 +N 1 + . . .+Nm-i+b/2. Since Nm > Ni+b, with the secret key one 

can discover whether Scr-i(m) — 0 or 1. Substituting S hy S' = S — S^r-i (m)-^m, 
this process can be continued until all are recovered. Then using the 

secret permutation a, one recovers • j bm- 

Thus decryption using u and a involves an easy instance of a knapsack prob- 
lem. As summarized in the article of Odlyzko Q, essentially all suggestions for 
cryptosystems based on knapsack problems have been broken. Here, however, 
the easy knapsack problem appears to have no bearing on the security of the 
system, since it appears that one must first search for the direction u. 

The new cryptosystem has similarities with the third version of the Ajtai- 
Dwork cryptosystem (see Q), but in the new system m -|- 1 = 0(n) bits of 
plaintext are encrypted to an n-dimensional ciphertext vector, instead of just 
one bit of plaintext. 

We have not specified the distribution of the Vi, aside from its inner product 
with u being superincreasing. The following distribution has a strong statistical 
indistinguishability from m -1- 1 independent uniform samples of the sphere. Let 
M be a large integer, say, M ;g> 2"'. Choose any b' > b. For analysis purposes we 



the (n — 2)-dimensional unit sphere orthogonal to u. Note that each ||ui|| = 1, 
after normalization. We denote this distribution by D. We note that 




m 




m 




m 





1 — Pi, where the pi’s are independently and uniformly distributed on 




M 

3=0 




2^b' _ b' b 
~ ~ M ^ M' 



M 



How secure is this new cryptosystem? We do not have a proof of worst- 
case/ average-case equivalence. We can discuss several ideas for attacks that do 
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not seem to work. The following discussion will also explain some of the choices 
made in the design of the cryptosystem. 

We will first show that if we did not employ the random permutation u, rather 
we publish as public key the unpermuted vectors Uq, . . . ,Vm, then there is an 
attack based on linear programming that will break the system in polynomial 
time. 

The attack works as follows: From the given vectors vo, ■ ■ ■ , Um we are assured 
that the following set of inequalities defines a non-empty convex body containing 
the secret vector u. 



Vo ■ X > b 

Vi ■ X > Vo ■ X + b 

V2 ■ X > {vo + Vi) ■ X + b 



Vm - X> {Vo + Vi-\ h Vm-l) ■ X + b 

Using linear programming to find a feasible solution to this convex body, we 
can compute in polynomial time a vector u satisfying all the inequalities. Even 
though u may not be equal to u, as along as u satisfies the above set of inequal- 
ities, it is as good as u itself to decrypt the message Hence, the 

permutation a is essential to the security of the protocol. 

Next, let’s consider the addition of the random perturbation r. This is to 
guard against an attack based on linear algebra, which works as follows. 

Assume the message w = were sent without the perturbation 

vector r. Then this vector is in the linear span of {uo.(o)) . . . , which 

is most likely to be linearly independent, by the way these Vi’s are chosen. Then 
one can solve for the m -f 1 < n coefficients Xi in w = These 

coefficients are unique by linear independence, thus Xi = Si, and we recover the 
plaintext. 

The addition of the random perturbation r renders this attack ineffective, 
since with probability very near one, w = YllLo + »" is not in the linear 

span of {Uct(o)j "(^cr(i)) • • • )"(^cr(m)}) which is of dimension at most m + 1. (If r 
were truly uniformly random from the ball ||a;|| < 6/2, then the probability 
that w belongs to the lower dimensional linear span is zero; if r is chosen with 
rational coordinates with exponentially large denominator, then this probability 
is exponentially small.) When the vector w is not in the linear span, to recover 
the coefficients Si appears to be no easier than the well known nearest lattice 
vector problem, which is believed to be intractable. 

Finally if the lengths of the vectors Vi are not kept essentially the same, there 
can be statistical leakage of information (see Section^. However, suppose the 
Vi are all roughly the same length, then the number of message bits m = cn 
should be less than n. If m were equal to cn, for a constant c > 1, then there is 
the following cryptanalytic attack. 
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Suppose ||ui|| « V for each i and define numbers Qi by 



I|r.|i||u.|| ^ |!r.||V 

Since the integers Ni are superincreasing, we can show that for all i < m— 3 log n, 
Qi < QmltT?- In fact, let ml = m — 31ogn, then we can inductively prove 
that Qm'+j > 2^Qm'-i > 2^Qi for all i < m' . Thus for each i < m' we have 
Qi < Qmln^- We will say that these Qi are “unusually small” (compared to 
maxQj). Of course one cannot compute the Qi’s since one is given only the 
permuted ordering by a and u is secret. 

The attack begins with the selection of a random subset of n — 1 vectors Vi. 
If we get all n — 1 vectors having an unusually small dot product with the secret 
vector u, then the normal vector perpendicular to all these n — 1 vectors will be 
a good approximation to u. From this one can break the system. We show next 
that with non-trivial probability all n — 1 vectors have an unusually 

small dot product with the secret vector u. This is at least 



/cn— 3 log n 



V 



3 log 



(„T) 



> 1 - 



31ogn 



cn — n 



3 

n ^ 



Thus one can try for a polynomial number of times, and with high probability 
one will find such a set of n — 1 vectors and break the system. This attack will 
not work if m = n/2. 



4 Statistical analysis 

It is clear from the discussion that the secret permutation a as well as the random 
perturbation r are both necessary. With a secret permutation a, however, an 
adversary may still attempt to find or approximate the secret vector u. In this 
section, the random perturbation r does not play an essential role in the analysis; 
it is easier to discard r in the following analysis, which is essentially the same 
with r, although a little less clean. Thus we will carry out the following analysis 
with b — 0 and r = 0. 

A natural attack is to gather statistical information by computing some val- 
ues associated with the vectors Va-(o), ■ ■ ■ ,Va-(m) which are invariant under the 
permutations. It is conceivable, for example, that Xlto “ Sto flight 
have a non-trivial correlation with the secret direction u since each Vi has a 
positive component in the u direction. We will show that if we did not choose 
our distribution for the Vi’s carefully, then indeed this attack may succeed; but 
the distribution D appears to be secure against this attack. 

Consider again the structural requirement that Vq ■ u > 0, and Vi ■ u > 
(vo -I- • • • -I- Vi-i) ■ u. A natural distribution for the VtS is to choose increment 
vectors Wi so that Vi = (uq -I- • • • -I- Vi-i) + Wi, where Wi are independently 
and uniformly distributed on the (n — l)-dimensional hemisphere 5””^ = {a; € 
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R" I ||a;|| = l,a;-u>0}, which consists of all the unit vectors in R" in the u 
direction. We will call this distribution F. 

Let Si = uo -I- • • • -I- Ui, 0 < z < m. Then vq = wq , Vi = Si_i -I- Wi, and 
Si = 2*wo + • • • 2°zUi by an easy induction. We need some preliminaries. Let /3„ 
denote the n-dimensional volume of the unit n-ball, let <5„_i denote the (n — 1)- 
dimensional volume of the unit (n — l)-sphere, then 

/3n= [ n > 2. 

Jo n 



And 



fJn = j ^ Pu-l{Vl - h^r~^dh = 2/3„_i/„, 

where the integral I„ = siiF 9 d9 = = • • ■ = 

asymptotically for large n. Also 

p27T pi —nj2 

Pn= / /3n-2(\/l - K^T~'^rdrd9 = Pn-2 — = 7T77J+2T- 

JO Jo n 1 

We will use the uniform distribution U on sets such as the hemisphere S'"” ^ , 
namely the Lebesgue measure on S"”^, and we will denote a random variable 
X uniformly distributed on such a set S by A Gu S. The following analysis 
is carried out using the exact Lebesgue measure. In the actual cryptographic 
protocols, this must be replaced by an exponentially close approximation on the 
set of rational points with exponentially large (polynomially bounded in binary 
length) denominators. The errors are exponentially small and thus insignificant. 
For clarity of presentation, we will state all results in terms of the exact Lebesgue 
measure. 

Lemma 1. Let w,w' Gu 5"”^ be independently and uniformly distributed on 
the unit (northern) hemisphere. Let u be the north pole. Then the expectation of 
the inner product 



Also, 



E[t(; • u] 



1 

(n - l)/„_2 





E[(wu) 2] = i. 



2 



E[(to • u){w' • u)] = (E[to • w])^ 



7m 




228 



Jin-Yi Cai and Thomas W. Cusick 



Proof For w Gu the density function for the value of the inner product 

h = w • u is 

= (y/1 - h^r~yi^_2. 



Hence 



E[tc ■u]= hpn-i{h)dh = 

Jo 



(n - l )/„_2 



(n 



2 ^(f) 



Similarly 




E[(t(; • u)^l = / h^pn-i{h)dh = —. 

Jo 

We note in passing that Egn-i [(to • u)^] over the whole unit sphere 5”“^ is 1/n 
as well, by symmetry h — > —h. 

The last equality follows from independence of w and w', 

E[(u; • u)K • u)] = E[/i/i'] = E[/r]E[/i'] = (E[h]f « — . 



Lemma 2. Letw, w', w" Gu -S'” ^ be independently and uniformly distributed on 
the unit hemisphere. Then 



E[tc • w'\ = (E[to • u])^ 



2 

7rn 



E[(rc • w')‘^] 



1 

n 



E[(tc • w'){w ■ w")] = (E[w • u])^E[(tc • u)^] « — 

7rn^ 

Proof Let w,w',w” Gu Sf_~^ . Choose a coordinate system so that u is the 
nth-coordinate. Then ww' = Ym=i Xi{w)yi{w'). By linearity and independence 
E[u;-u;'] = Er=iE[ Xi]E[yi\. For i < n, the symmetry Xi -Xi implies that 
E[xJ = 0. For i = n, x„(tc) = w ■ u, and similarly for yn{w'). Then it follows 
that E[x„] = E[y„] = E[/i], and 

Efwu;'] = (E[/r])2 « 

irn 

For E[(u)-w')^], expand (ELi Xiyif = ELi ^iVi +T.i<i^j<n^iyiX3y3 - For 
i j, at least one of i or j is not n, and by independence and the symmetry 
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Xi — > —Xi, we have E[xiyiXjyj] = E,[xiXj]E,[yiyj] — 0. Thus the expectation of 
the second term is 0. By linearity and independence 

n 

n{w-w'f] = Y,nxj]nvi]- 

For i = n, it is = 1/n^. For z < n, by the symmetry — > — a;„, it can 

be seen that E[xf] is the same if we were to evaluate this expectation on the 
uniform distribution on the whole unit sphere. But on the whole sphere this is 
the same as Egn-i [a;^] = Egn-i [h?\. This, however, by the symmetry h —h, 
is the same again if we were to evaluate it back on the hemisphere 5"“ ^ . Hence 
ultimately E[a;^] = E[/i^] = 1/n, and E[xf]E[y^] = 1/n^. It follows that 

E[(tc • w')'^] = \ jn. 

Finally for E[(to-t(;')(ty-ry")], we expand the product 
= Xl”=i 3:fyiZi + least one of them is not n, so 

that either E[z/i] = 0 or E[zj] = 0, thus E[Xi]E[yi]E[ 2 :j] = 0. Then 

n 

E[(ic • w'){w ■ w")] = ^E[x?]E[yjE[ 2 ,]. 



For i < n, E[yi] = 0 by symmetry as before. For i = n, it is E[/i^](E[ft,])^ « 

□ 

For the distribution F, we will show that the secret information u is not safe. 
In fact we claim that Sm can be used to approximate the direction u. Consider 
Sm-U = 2’^(wo ■u)-\ 1 - 2°(Wm ' u) . 

EF[Sr„ . u] = (2"^ + • • • + 2°)E[zc • u\ « 2"^+i J — . 

V Trn 

Next we compute the variance Var^ [Sm • u]. First (Sm ■ u)^ = 2^^(wo ■ u)^ + 
■■■ + 2^(w^ ■ u)2 -L Eo<zAi<rn2'"-*2— • u)(w, -u). So 



Ef[(s^ ■ ufj = (2^^ + ■ ■ ■ + 2°)E[(w u)^J + 2^^^E[(wu)(w'-u)J 



4"i+i _ 1 

3n 



2*+-’ - 2^* 
0<z<m 



(E[w • u])^ 



^m+l ^m+2 ^m+1 



3n 3nn 3nn 



(4 -I- 7t). 



It follows that 



^m+l 

VarF[s,„ • u] « — (tt - 2). 

37m 
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We note that the normalized ratio 



EF[Sm • u] 
i/VarF[Sm • u] 




2.2925564. 



This indicates that Sm has a significant correlation with the hidden direction u, 
and hence u can not be considered secure under this distribution F. 

More directly it can be shown that 



l)] ^ 

EF[||S77i||]EF[||Um — Um-l||] “ 

asymptotically. Thus one can expect Sm to be used to distinguish Vm from the 
others. 

We now return to our chosen distribution D, and show that in this dis- 
tribution there is no easy statistical leakage. In this distribution, Vi = + 

^/l — , and Pi are independently and uniformly distributed on the (n — 2)- 

dimensional unit sphere orthogonal to u. Recall ||ui|| = 1. Let s' = uq Uf 

We consider ||s'„|p and s'^ • u. Clearly ||s'|p = (m -f 1) + ■ Vj). 

Lemma 3. For 0 < i ^ j < m, 



2i+j 

^d[v^-Vj] = 



and VarF)[ui • Vj] 



1 

n — 1 





Proof We have Vi ■ Vj = ^ -k - ^{Pi ■ Pj)- By symmetry, 

Fign- 2 [pi ■ Pj] = 0, so that FiD[vi ■ Vj] = Thus, 

2^i \ / 2'^i \ 

-Jp) b-MjJVarolp. pJ. 

We have YsccolPi-Pj] = Ef[(p7-Pj)^] = fli h'^(p„-2(h)/2)dh = The lemma 
follows. □ 

Now 



VarcRi • Vj] = 1 



EF[||4lP] = (m+l)+ ^ 3^, 

and Eo<7^,<7.2'+^' = Eo<7.,<7„2'+^-E::o 2^* « |(2-/M)^ Hence Ef[||s'^|P] 
« (m-l-l)-l- 2— g — . Ignoring the exponentially small term 2^"'/M^, Ef[||s^|P] « 
m -k 1. 

One should compare this with the uniform distribution U for which all Vi’s 
are independently and uniformly distributed on 5”“^. In this case Eu[vi-Vj] = 0, 
for i yk j, and Ef[||s'^|P] = m+l. 
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We next evaluate the variance Var£)[||s! 



/ l|2l 



Varz,[||s'JP] = Ez, 



= 4E£) 



{Vi-Vj - Eolvi-Vj]) 






22 



221 






= 4E£) 



4Ed 






(i<l) 



• Pj){pi' • Pj') 



For (i < j) (z' < f), there are two cases. If {i,j,i',j') are all distinct indices, 
then clearly pi ■ pj and p^/ • pji are independent. Thus E£>[(pi • pj){pi> ■ Pj/)] = 
Ed[Pi • Pj]E£)[pi' • Pj'] = 0. If there are only 3 distinct indices among (z, j, z', j'), 
say z = z', then by fixing pi, the conditional distribution of pi • pj and pi • pji over 
Pj and pj! are independent, and E£>[(pi • Pj)\pi\ = E£>[(pi • Pj')\pi] = 0. Thus in 
any case E£>[(pi • Pj){pi' ■ py)] = 0, for (z < j) [i' < j>)^ and 




We have E^i [(pi • py)^] = Egn-^lK^] = l/(zz — 1). Ignoring exponentially small 
terms such as 2'"/M, we get 



Var 



D\ 



4 

n — 1 



C"r) 



This is to be compared to the uniform distribution U. Again ||s'|p = {m + 
1) + ^o<i^j<mi'^i • Vj). But for the uniform distribution U, Eu[vi ■ Vj] = 0, for 
z y^ j, and so 



Varr 



= Er 






{Vi-Vj) 



= 4E(7 



X] iVi-Vj){Vi' -Vj') . 

0<i<j<m (i<l)A(i'<l') 



By the same argument, Eu[{vi ■ Vj){vi' ■ Vj')] = 0, for all (z < j) y^ (z' < f). 
Hence Var,y[| |s(„| P] = 4 X]o<i<j<m Ec/[K ' ^ 3 )% where E,y[(z;i • v^f] is 
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over (n — l)-dimensional unit sphere, and thus equal to 1/n (see the proof of 
LemmaB . It follows that 



We conclude that at least in terms of the length of the sum | |sm| | = | + ■ ■ ■ + 

Vm\\, our distribution D behaves very much like the uniform distribution U. 

We return to the correlation between and u. It is easy to see that with 
distribution D 



s 



/ 

m 



U — 



m 



E 



2 * 

M 



2 ^ 71+1 



M ’ 



which is exponentially small. Also since it is a constant Var£)[s(„ • u] = 0. For 
the uniform distribution U , 



m 

s'm- U='^Vi- U, 



and Ec/[s^ • w] = 0. For the variance 

Varu[s'^ ■ u] = • uf] 

/ m ^ 

= Er/ I • u) 



Kl=0 



= E, 









By independence E£/[(ui • u){vj ■ u)] = E(/[ui • u]E,u[vj • u] = 0 for i j. Also 
E£/[(i!i • ti)^] = 1/n. Hence Varj/[s(„ • u] = m/n. Therefore statistically one can 
not deduce much from s/j • u in the distribution D, since it is exponentially 
small, and well within the range in which this value would have been under the 
uniform distribution, where Ej/[s(„ • u] = 0 and Varj/[s(„ • u] = 12(1). 

In fact, suppose u' G 5”“^ is any unit vector, u' yf ±u. The estimates of 
^u[s'm ' =0 and Varj/[s(„ • u'] = m/n = 17(1) are still valid. Let 77 be 

the 2-dimensional plane spanned by u and u' . Let u' = {cos 0)u + (sin0)rr*-, 
where the unit vector T u. Then we can choose a coordinate system such 
that IT*- is the (n — l)st coordinate for pi, and = E5n-2[7] = 0. 

Therefore E£i[s(„ • u'\ = (cos 0)E£i[s(„ • v\. Thus |E£i[s(„ ■ u']\ < 2"*+^/M, which 
is exponentially small. This implies that s/j has correlation with no particular 
direction, the same as under the uniform distribution U . 
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Abstract. Most modern security protocols and security applications 
are defined to be algorithm independent, that is, they allow a choice 
from a set of cryptographic algorithms for the same function. Although 
an algorithm switch is rather difficult with traditional hardware, i.e., 
ASIC, implementations, Eield Programmable Gate Arrays (FPGAs) offer 
a promising solution. Similarly, an ASIC-based key search machine is in 
general only applicable to one specific encryption algorithm. However, a 
key-search machine based on FPGAs can also be algorithm independent 
and thus be applicable to a wide variety of ciphers. We researched the 
feasibility of a universal key-search machine using the Data Encryption 
Standard (DES) as an example algorithm. 

We designed, implemented and compared various architecture options of 
DES with strong emphasis on high-speed performance. Techniques like 
pipelining and loop unrolling were used and their effectiveness for DES on 
FPGAs investigated. The most interesting result is that we could achieve 
encryption rates beyond 400 Mbit/s using a standard Xilinx FPGA. This 
result is by a factor of about 30 faster than software implementations 
while we are still maintaining flexibility. A DES cracker chip based on 
this design could search 6.29 million keys per second. 



1 Introduction 

Most modern security protocols are defined algorithm independent and the pro- 
tocol standards support a variety of algorithms. Although it is fairly easy to 
switch cryptographic algorithms in software, it is often painfully difficult in 
hardware. On the other hand, hardware solutions provide better performance 
and higher physical security. One answer to this problem is reconfigurable hard- 
ware, based on modern Field Programmable Gate Array, or FPGA, devices. 
Since FPGAs can switch algorithms they can be used to build algorithm agile 
applications. Although at a given time only one algorithm is configured, the 
FPGA can be reconfigured with a different algorithm. The following lists the 
main advantages of cryptographic algorithms on FPGAs 
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— Algorithm agility, the same FPGA can be reprogrammed at run time to 
support different algorithms, 

— Scalable security, through different versions of the same algorithm (e.g.. 
Data Encryption Standard (DES) and triple-DES), 

— Alterable architecture parameters, e.g., desirable features such as vari- 
able S-boxes, variable number of rounds, or different modes of operation can 
easily be realized. 

Another interesting cryptographic application of FPGAs are key-search ma- 
chines. Building a key-search machine in hardware is a major investment. Such 
a machine can be used to retrieve secrete keys for only one algorithm. There- 
fore it might be interesting to have a key-search machine which is also defined 
algorithm independent, supporting a variety of algorithms. 

We designed a universal key-search machine and used a high speed DES 
implementation for FPGAs as an algorithm. DES is currently the most widely 
used private-key algorithm and it is also part of many other standards, e.g., 
IPSec protocols, ATM cell encryption, the Secure Socket Layer (SSL) protocol, 
and for various ANSI banking standards. Even though it is expected that DES 
will not be reapproved as a federal US standard this year, it is still important 
and will continue to play a major role for several years to come. 

In Sect J we summarize previous relevant work. Section H explores different 
architecture options for DES like loop unrolling and pipelining and presents the 
architecture versions we decided to implement. In Sect we provide an overview 
of the projected universal key-search machine. Sectionjdescribes the design and 
implementation cycle and gives an overview of the hardware and software tools 
we used for our research. In Sect. H we present and compare the results of our 
implementations of the different architectures and extrapolate cost and speed of 
the proposed universal key-search machine running a DES Gracker chip. 



2 Previous Work 

Already one year after the Data Encryption Standard was released in 1976, 
Whitfield DifRe and Martin Heilman published a paper which describes in detail 
a key-search machine for DES 0. They estimated that for $20 million a key- 
search machine could be built which recoveres a DES key within 12 hours. Two 
papers with quite different results appeared in 1993 Q and Q. Both papers 
describe a custom chip design and develop from there a key-search machine. 
Reference ^3 calculates 3.5 hours time to break DES at $1 million. Their chip 
can test 50 million keys per second. The design shown in Q uses 32 DES breaker 
(DESB) chips and breaks DES in 1.2 days, each DESB chip can test 5333 million 
keys per second. This number appears to be very high. The estimated cost 
including overhead is only $2500 which appears to be very low to us. 

Modern custom hardware implementations of DES can achieve data rates of 
1 Gbit /sec and beyond. References were the first report of a custom chip 
employing modern GaAs technology to achieve 1 Gbit/sec. This design as well as 
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many commercially available devices do not support a key change at full speed; 
i.e., a different key for every block of plain text. 

The first paper to show an implementation of DES on FPGAs is Their 
approach generates key-specific circuitry for the Xilinx FPGAs. Therefore a bi- 
nary image (bit-stream) for each key has to be precomputed before it can be used 
in the device. As for a key search machine the key has to be changed after every 
encryption, this chip is not suitable for this task. Their fastest implementation 
without decryption and adjusted to one key achieves a data rate of 26 Mbit/sec. 

A group of cryptographers analyzed in 1996 the security of the key lengths 
of symetric ciphers based on current technology. In their report ^ employing 
FPGAs is highlited as an efficient approach for a brute-force attack against 
cryptographic systems. FPGAs are inexpensive, fast, and they need less inital 
investment than Application-Specific Integrated Gircuits (ASIG). If we just take 
the plain FPGA cost for our design into account we would neet to invest three 
times as much money as ^ to recover keys at the same speed. 

3 Architecture Options for the DES Algorithm 

A high speed implementation of DES is crucial for an efficient DES cracker. But 
not only speed is an important factor, the design also must support a fast key 
change. 



3.1 Basic DES 

The DES algorithm possesses an iterative structure Q. Data is passed through 
the Feistel network 16 times, each time with a different sub- key from the key 
transformation. This structure leads itself naturally to the block diagram shown 
in Fig. H The incoming data and key are passed through initial permutations. 
Then the data passes 16 times through the Feistel network and also 16 sub-keys 
are generated simultaneously. Both, the Feistel network operation and the sub- 
key generation is denoted in the block diagram as Combinatorial Logic (GLU, 
combinatorial logic unit). In order to be able to loop the output back to the 
input of the combinatorial logic unit we need Registers and Multiplexers. The 
multiplexer switches the input of the combinatorial logic unit between data from 
the previous round and new input data and key. The registers store the results of 
each loop and pass them on to the multiplexer. The output of the data register 
passes through the Final Permutation. 



3.2 Loop Unrolling 

Loop unrolling is the concatenation of combinatorial units in order to reduce the 
number of iterations if, for instance, two loops are unrolled then two rounds of 
DES will be calculated with one clock cycle. FigureHshows the block diagram. 
This block diagram differs from Fig.Jonly in the 2nd combinatorial logic unit. 
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^Data ^Key 




Fig. 1. DES block diagram 




Fig. 2. Block diagram of DES with two unrolled loops 



The initial and final permutations as well as the registers and multiplexers are 
the same. 
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Loop unrolling leads potentially to speed improvements for the following 
reason. In the not unrolled version, one iteration of DES has the following simple 
timing model: Tmux + Tci + Tleg, where Tmux denotes the time a signal needs to 
pass through a multiplexer, Td the delay introduced by the combinatorial logic, 
and Treg the delay introduced by the register. Wiring delays are assumed to be 
included in the specified times for the elements. So for the whole 16 rounds this 
sums up to: 16 * T^ux + 16 * Td + 16 * Tj-eg. 

The equation for one loop of the structure in Fig.flis thus: Ti„ux+2*Td+Ti.eg. 
This has to be executed 8 times, so that the over-all delay is now: 8 * Tmux + 16 * 
Td -I- 8 * Tleg. The same principle can be applied to four unrolled DES rounds 
resulting in: 4 * 71„ux -I- 16 * Td -I- 4 * Tl-eg. 

Obviously we can not reduce the delay introduced by the combinatorial logic 
units but we reduced the runs through the multiplexers and buffers. There is also 
another motivation for speed increase if modern design methods are applied. It 
is possible that the synthesis tools can optimize an unrolled design better, due 
to boundary optimization. Loop unrolling works thus well with a modern design 
process which can reduce the logic complexity and delay of the design. 



3.3 Pipelining 

Pipelining tries to achieve a speed improvement in a different way. Instead of 
processing one block of data at a time, a pipelined design can process two or 
more data blocks simultaneously. A design with two pipelines is shown in Fig.^ 
The block diagram in Fig.flis very similar to the one with the two unrolled loops 
(Fig-0- The only difference is the additional buffer between the combinatorial 
logic units. 

The first block of data Xi and the associated key fci are loaded and passed 
through the initial permutations and the multiplexer. The 1st combinatorial logic 
unit computes Xi^i and fci_i which are stored into the 1st register block. On the 
next clock cycle Xi^i and leave the 1st registers and the 2nd combinatorial 
logic unit computes Xi ^2 and fci _2 which are put into the 2nd register block. At 
the same time the second block of data X 2 and the key k 2 are loaded and passed 
through the initial permutations, and the multiplexer, and the 1st combinatorial 
unit computes 0 : 2,1 and fc 2 ,i which are moved into the 1st register block. 

Now the pipeline is filled and with each clock cycle another iteration for two 
pairs of data and key are computed. The data which has entered the pipeline 
first, will also exit it first. At that time the next data and key pair can be loaded. 

The advantage of this design is that two or more data-key pairs can be 
worked upon at the same time. As there is still only one instance of the initial 
permutations, the multiplexer and the final permutation, the cost in terms of 
resources on the chip will not be twice as high as if we implemented two full non- 
pipelined DES designs. Also there has to be only one control logic which is just 
slightly more complicated than for a non-pipelined DES design. The maximum 
clock speed should be roughly the same since during one clock cycle the same 
amount of logic resources has to be traversed as in the non-pipelined design. It 
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Fig. 3. Block diagram of DES with two pipeline stages 



is also straightforward to design pipelines with more than two stages, e.g., with 
four. 



3.4 Combination of Pipelining and Loop Unrolling 

It is possible to combine both architecture acceleration techniques that we de- 
scribed above. For instance, each pipeline stage can contain two unrolled loops. 
The resulting block diagram looks similar to Fig. J except that each combi- 
natorial logic unit is duplicated. During one clock cycle two iterations of two 
data-key pairs are computed: and fci,4 are computed from X\^2 and 

and X2,2 and fc2,2 are computed from X2 and k2- An extension to four unrolled 
loops per pipeline stage is also possible. 



3.5 Design Decisions 

One major objective of this research was to obtain a realistic comparison of the 
different acceleration methods (loop unrolling, pipelining, combination of both) 
for FPGAs. Table Hshows the architecture versions we decided to implement, 
results of which are described in Section^ 
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Table 1. Implemented DES architectures 



Name 


Description 


DES_ED16 


standard DES (16 iterations) 


DES_ED8 


DES with 2 nnrolled loops (8 iterations) 


DES_ED4 


DES with 4 unrolled loops (4 iterations) 


DES_ED16x2 DES with 2 pipeline stages 


DES_ED16x4 DES with 4 pipeline stages 


DES_ED8x2 


DES with 2 pipeline stages each containing 2 unrolled loops 



4 Universal Key-Search Machine 

The design of the Universal Key-Search Machine has to be algorithm indepen- 
dent. Therefore no assumption on the length of the plain text, cipher text, or key 
is made. The system architecture is described in the following subsection and 
the modifications necessary for our DES FPGA | to meet the I/O requirements 
for this architecture in the next subsection. 



4.1 System Architecture 

Our biggest concern is scalability. Therefore one design goal is to have a minimum 
amount of wiring and additional external logic (glue logic) . A bus topology and 
an optimized addressing scheme was developed to meet this requirement. The 
addressing scheme requires no address decoding logic and there is no logical limit 
on the number of key-search chips connected to the bus. 

FigureHgives an overview of the system architecture. When more key-search 
chips are attached additional buffers are needed to drive the bus. These bus 
drivers are left out of Fig. ^ for simplicity. The cracker-bus comprises an bidi- 
rectional 16 bit wide data bus, the control signals, IE (input enable) and OE 
(output enable), four chip select signals, CS (chip select), CSI (chip select in), 
CSO (chip select out), and CSCLK (chip select clock), and an indicator done. 
The function of the control chip is to interface the cracker-bus with the ISA bus 
of a PC. We use DESC, the DES Cracker chip, as an example. 

The system supports sequential as well as parallel access. When CS is active 
all chips are selected and listening to the bus. The plain text and the cipher 
text can be send with IE to every chip at the same time. A different start key 
has to be sent to each chip. For this purpose we implemented an sequential 
addressing scheme. The signals CSI, CSO, and CSCLK work with the chips as 
a long shift register. The control chip sends a logic ‘1’ to the first chip. With 
each clock CSCLK this ‘1’ is passed from CSO to CSI of the next chip; the 
next chip is selected and the start key can be send with IE. The number of clock 
cycles CSCLK it takes for the ‘1’ to come back to the control chip is equal to 
the number of key-search chips put in. With OE the current key of the selected 
chip is clocked out. With this function the state of the key-search machine can 
be saved or the final key can be retrieved. The done signal is a tristated signal 
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Fig. 4. System architecture 



pulled to logic ‘0’ with a resistor. When the key is found this signal goes to ‘1’. 
Now the control chip addresses sequentially the chips. The chip which found the 
key takes the ‘1’ off as soon as it is selected. Through this scheme the controller 
can identify the chip. 



4.2 DES Cracker-Chip 

The DES cracker chip (see Fig.H is based on our DES FPGA Q denoted as 
DES Core in the Figure. Each of the implemented DES architectures shown in 
TableOcan be used as a DES Core. Our DES FPGA is not optimized for key- 
search but it supports a key change for every block of plain text at full speed. 
The buffers for the cipher text and the plain text are 64 bits wide, the buffer 
for the start key 56 bits. The bus can transfer 16 bits at a time, therefore the 
buffers are split up in parts of maximum 16 bits. They are addressed through 
a shift register (not shown in Fig.^. When the chip is selected the signal IE 
cycles through the input buffers. The 4x16 bits output register for the current 
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key is addressed via a 4 bits shift register, which in turn is operated through the 
OE signal when the chip is selected. 




Fig. 5. DES Cracker-Chip 



A 56 bit counter takes care of incrementing the key for each new encryption. 
A 56 bit compare unit compares the encrypted plain text with the cipher text. 
If both are the same the done signal goes high. The chip select circuitry and the 
control logic is not shown in the Figure. 

5 Methodology 

This section describes the design cycle and the tools used for our research of the 
DES core. The design procedure can roughly be broken down into the following 
stages: 

1 . Creating VHDlJ descriptions of the DES design employing different archi- 
tecture options and verifying each version. 

2. Synthesis and logic optimization. 

^ VHDL is the VHSIC Hardware Description Language. VHSIC is an abbreviation for 
Very High Speed Integrated Circuit. 
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3. Place and Route for a specific device and back-annotated verification of the 
design. 

Early in the design we decided upon Xilinx as FPGA vendor and a device 
family. That decision was based mainly on our previous work described in Q. 
The entire design was implemented using VHDL and vendor specific macros. 
We used Synopsys version 1997.08 for the synthesis, logic optimization, design 
verification, and timing analysis. 

Xilinx Alliance Series version Ml. 3. 7 provided macro functions (LogiBLOX) 
and was used for the place and route. The Xilinx tools also perform a timing 
analysis after place and route which shows the minimum clock period for 
the given design. This clock period is guaranteed by Xilinx for the design and 
therefore is to be seen as rather pessimistic. We are using this timing result for 
our speed calculations. 

6 Results 

We implemented multiple versions of each architecture option listed in TableHin 
order to evaluate their effectiveness. In the following sections we compare the dif- 
ferent designs. In most cases the designs are compared to the design DES_ED16 
which serves as our reference model. 

The unit CLB stands for combinatorial logic block which is employed by 
Xilinx to measure the amount of logic resources on a device. We are using it 
here to compare the amount of logic resources used by a given design. The 
abbreviation CLU stands for combinatorial logic unit (see Chap.H, i.e., one 
Feistel network round. 



6.1 Loop Unrolling 

We implemented two loop unrolled versions: DES_ED8 and DES_ED4- The 
design DES_ED8 contains two combinatorial logic units {CLU, see Sect. ^3 
and therefore encrypts or decrypts one data block in 8 clock cycles. The design 
DES_ED4 contains four CLUs and provides the result after 4 clock cycles. Both 
designs are compared with the design DES_ED16 in Tabled 



Table 2. Comparison of loop unrolled architectures 



Design 


Chip 


CLBs 

used 


CLBs 

per CLU 


Min 

CLK 

in ns 


Data Rate 

per CLU 
in Mbit/s 


Data R 

in Mbit/s 


ate 

rel. 


DES.ED16 


XC4008E-3-PG223 


262 


262 


40.4 


99.1 


99.1 


1.00 


DES.ED8 


XG4013E-3-PG223 


443 


222 


54.0 


74.1 


148.2 


1.50 


DES.ED4 


XG4028EX-3-PG299 


722 


241 


86.7 


46.1 


184.5 


1.86 
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Our standard DES design DES_ED16 allowed already for data rates of 
99.1 Mbit/se(| The design DES_ED8 is with 148.2 Mbit/sec 50% faster than 
DES_ED16 whereas the resource consumption (in CLBs) increases by 69%. The 
design DES_ED4 is with 184.5 Mbit/sec only 25% faster than DES_ED8, the 
speed increase is only half as much as from the first unrolling. The resource 
consumption increases by 63%. 

The number of CLBs divided by the number of CLUs indicates that the 
amount of logic resources consumed per unrolled CLU is almost constant. The 
speed divided by the number of CLUs shows that the speed for one CLU in the 
design DES_ED4 is less then half the speed of DES_ED16. From this we can 
see that loop unrolling shows a non-linear speed improvement. It seems unlikely 
that a fourth loop unrolling would yield significantly higher performance. 

6.2 Pipelining 

We implemented two pipelined designs, DES_ED16x2 and DES_ED16x4- The 
design DES_ED1 6x2 contains two CLUs and therefore 2 pipeline stages and the 
design DES_ED16x4 contains four CLUs and therefore 4 pipeline stages. The 
encryption or decryption of one block of data takes in both cases 16 clock cycles. 
TableHcompares both designs with the design DES_ED16. 



Table 3. Comparison of pipelined architectures 



Design 


Chip 


CLBs 

used 


CLBs 

per CLU 


Min 

CLK 

in ns 


Data Rate 

per CLU 
in Mbit/s 


Data R 

in Mbit/s 


ate 

rel. 


desj:di6 


XC4008E-3-PG223 


262 


262 


40.4 


99.1 


99.1 


1.00 


DES.ED16x2 


XC4013E-3-PG223 


433 


217 


43.5 


91.9 


183.8 


1.86 


DES.ED16x4 


XG4028EX-3-PG299 


741 


185 


39.7 


100.7 


402.7 


4.06 



The design DES_ED16x2 with two pipeline stages is with 183.8 Mbit/sec 
almost twice as fast as our reference design DES_ED16. And our design 
DES_ED16x4 achieves 402.7 Mbit/sec which is four times faster. The speed 
divided by the number of CLUs shows that the speed per CLU stays almost 
constant for all designs. The lower speed for the design DES_ED16x2 is caused 
by the lack of wiring resources on the device which results in a less efficient 
design. 

The amount of logic resources consumed per implemented CLB is decreasing 
if we create more pipelines. This is due to the fact that the control unit does not 
become more complicated if we implement more pipelines. Also the multiplexers 
are implemented only once. 

It is interesting to compare the pipelined designs with the loop unrolled de- 
signs. It can be seen that DES_ED16x2 is both faster and smaller than the loop 



^ Mbit= 10®bit 
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unrolled DES_ED8. The difference is even more dramatically if the DES_ED1 6x4 
is compared with the DES_ED4- DES_ED16 is more than twice as fast as 
DES_ED4 and utilizes almost the same amount of CLBs. 

6.3 Combination of Pipelining and Loop Unrolling 

A design that contains loop unrolling as well as pipelining is in the simplest 
version already as large as the largest designs we have implemented so far which 
were DES_ED16x4 and DES_ED4- Therefore we implemented only the design 
DES_ED8x2 which contains 4 CLUs; 2 in each of the 2 pipeline stages. Table J 
compares this design with DES_ED16x2 and DES_ED8. 



Table 4. Comparison of a combined architecture with others 



Design 


Chip 


CLBs 


Min 

CLK 

in ns 


Data Rate 

p. pipeline 
in Mbit/s 


Data Rate 

in Mbit/s 


DES.ED8x2 


XC4028EX-3-PG299 


733 


48.0 


166.5 


333.0 


DES. ED 16x2 


XG4013E-3-PG223 


433 


43.5 


91.9 


183.8 


DES.ED8 


XG4013E-3-PG223 


443 


54.0 


148.2 


148.2 



It is not easy to compare this mixed design with the two other designs. The 
minimum clock period shows that the time it takes for two CLUs (loop unrolled) 
to execute in the design DES_ED8x2 is faster than in the design DES_ED8. It 
is of course slower, but surprisingly not much, than one CLU in the design 
DES _1 6x2. 



6.4 Extrapolation for Cracker 

Our fastest design is DES_ED16x4 with a data rate of 402 Mbit/sec. This design 
could be the DES Core for our DES Cracker-Chip (DESC) as shown in Sect. ^3 
After the plain text, cipher text and start-key are loaded into the buffers the DES 
Core can start working at full speed. We expect that we can achieve the same 
data rate for the DESC chip as we achieve with DES_ED16x4. The data rate of 
402 Mbit/sec corresponds to a key-search rate of 6.29 million keys per second. 
This means that a brute force attack on DES with 2®® keys would take 182 years 
on average. A fully unrolled design would theoretically search 25 million keys per 
second which is about half the speed of the key-search ASIC introduced in Q. 

A cracker box comprising one control board, 16 boards each with four DESC 
chips (using DES_ED16x4 as the DES core) connected to a PC could search the 
entire key space of DES with a 40-bit key in one hour. The FPGA costs alone 
for such a box are about $18.00(| when we include 50% overhead for the other 

Based on actual current (May 1998) pricing supplied by a vendor. 
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required circuits, boards, racks and an additional $1000 for the PC (a low end 
PC is fast enough) this cracker box would amount to cost $28,000. 

Assume that eight such boxes could be attached to one PC and 160 PCs are 
connected via a network. This machine comprises 81,920 DESC chips and would 
cost about $35 million. It could find a DES key of 2^® bits length in one day on 
average. We want to notice that the prices for FPGAs are decreasing at a very 
high rate and that a key search machine at a considerable lower price might be 
feasible in the near future. 

7 Comparison 

We implemented all DES designs based on standard devices from Xilinx with 
a medium speed grade. With these devices we achieved speeds of up to 
402.0 Mbit/sec using four pipelined stages and no loop unrolling. If we com- 
pare the reported DES speeds for high speed ASICs (1600 Mbit/sec) 
high speed software (13 Mbit/sec) with our best result of 402.0 Mbit/sec 
we conclude that the speed-up factor from software to FPGAs is about 31, and 
from FPGAs to ASICs is about 4. It is difficult to provide a fair comparison to 
this respect but our results clearly show that FPGAs implementations of DES 
are very attractive for many applications. If we compare our result with the one 
from B we see that even though the same device (although at a slower speed 
grade) was used, the fastest implementation in the reference is by factor three 
slower and requires almost twice as much logic resources as our DES_ED1 6 . It is 
to be noted that the fastest implementation in the reference is adjusted to one 
key. 

Using our fastest design as a DES core for the DESC chip we can achieve 
a key-search rate of 6.29 million keys. For $38 million a machine can be built 
that breaks DES with 2®® keys in one day on average. Although it might very 
well be possible to build a DES-specific key search machine (much) cheaper, 
we would like to note that our design allows the application to a wide variety 
of block ciphers by simply downloading a different algorithm in to the FPGAs. 
Considering the significant cost of a key-search machine, this can be a major 
advantage if more than one algorithm is to be attacked. 
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Abstract. MMX is a new technology to accelerate multimedia appli- 
cations on Pentium processors. We report an implementation of IDEA 
on a Pentium MMX that is 1.65 times faster than any previously known 
implementation on the Pentium. By parallelizing four IDEA’S we reach 
an unprecedented 78 Mbits/s throughput per output block on a 166MHz 
MMX. In the light of rapidly increasing popularity of multimedia appli- 
cations, causing more dedicated hardware to be built, and observing that 
most of the current block ciphers do not benefit from MMX, we raise the 
problem of designing block ciphers (and encryption modes) fully utilizing 
the basic operations of multimedia. 

Keywords: block ciphers, fast implementations, IDEA, multimedia ar- 
chitectures, Pentium MMX. 



1 Introduction 

The second main objective besides security in designing cryptographic primi- 
tives is speed: even 10% difference in speed (by the same security level) may 
bias industry to prefer one cipher to another. Still, it is not an easy task to 
compare ciphers by virtue of speed. The reasons are manifold, depending on the 
human factor (the best known implementation may not be the best possible im- 
plementation) but also on the hardware available: ciphers optimized for 32-bit 
processors may not be optimal on 64-bit processors and vice versa. Application 
of new microprocessor techniques (DSP — Digital Signal Processing, VLIW - 
Very Long Instruction Word, SIMD — Single Instruction Multiple Data) in cur- 
rent general-purpose microprocessors will significantly sway our beliefs in the 
speed ratio of available ciphers 

Because of the quickly increasing importance of multimedia, dedicated hard- 
ware will be commonplace tomorrow. Today’s multimedia extensions (to name a 
few, Intel’s MMX, Sun’s VIS, HP’s MAX-2, Cyrix’s MMX, AMD’s 3DNow!) are 
just the first flowers. New generations of multimedia enhanced processors will 
even more change our judgment of what it means to be “software” optimized. 

MMX, incorporated in every new Intel processor (e.g., in the Pentium with 
MMX and the Pentium II), is a relatively new extension made to accelerate 
multimedia applications. Considering the worldwide spread of MMX capable 
computers, design and implementation of cryptographic primitives utilizing the 
basic operations of multimedia applications should be considered very seriously. 
Some work in this area has already been done by designing new hash functions 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 24S-^^| 1999. 

© Springer-Verlag Berlin Heidelberg 1999 
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and stream ciphers 



Biham viewed a 64-bit processor as a 



SIMD parallel computer, which can compute 64 one-bit operations simultane- 



ously, getting significant acceleration of DES 



Using the same method 



J have later improved Biham’; 



s re- 



bit- slicing”), several papers 
suits. 

There is a wide variety of block ciphers in more or less general use. The 
popularity of some of those ciphers is based on the trust in the design of the 
cipher, the popularity of some other ciphers is based on the high throughput 
in combination with reasonable security. In particular, the block cipher IDEA 
is believed to be very secure due to the proper interaction be- 
tween three different group operations. Although, apart from DES, IDEA seems 
to be the most studied block cipher, no currently known attack (e.g., 

or against the full IDEA performs better than exhaustive 

search. Interaction between three different group operations adds confidence in 
idea’s security, but the frequent use of multiplication does not allow fast soft- 
ware implementations on common microprocessors (TableJ . 



Block cipher 


Block size 


Cycles 


Mbits/s 


Square 


128 


244 


87.1 


Blowfish 


64 


158 


67.2 


RC5-32/16 


64 


199 


53.4 


CASTS 


64 


220 


48.3 


DES 


64 


340 


31.2 


SAFER (S)K-128 


64 


418 


25.4 


Shark 


64 


585 


18.2 


IDEA 


64 


590 


18.0 


3DES 


64 


928 


11.4 



Table 1. Performance in clock cycles per block of output a nd Mbits/ s of several 
block ciphers on a 166MHz Pentium by Antoon Bosselaers ^ 






We describe an implementation of IDEA on MMX, that is significantly faster 
than the best possible implementation of IDEA on the standard Pentium. One 
attempt to optimize IDEA on MMX has already been taken: Masayasu Ku- 
magai’s implementation of non-standard IDEA encrypts three IDEA 

blocks in parallel, achieving 45.6 Mbits/s per individual encryption on a 200MHz 
Pentium MMX. Our implementation includes a fast version of standard IDEA 
and a parallel version that is about twice as fast as Kumagai’s. 

The MMX architecture was chosen for it being the de facto standard, IDEA 
was chosen because no other current “industry-standard” block cipher seems 
to benefit from the Pentium MMX and because of its practical importance. 
Moreover, in the following we demonstrate that IDEA utilizes only about one 
third of the Pentium MMX and is, additionally, easily parallelized without a 
significant parallelization overhead. The resulting parallel “4- way IDEA” is faster 
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than any of the 64-bit block ciphers in Table | by doing this we transform a 
relative slow (and as generally believed, a very secure) cipher into a very fast 
(and still very secure) cipher. Observing that, we raise a question of designing 
new, multimedia optimized block ciphers. 

Section^gives a background to MMX and multimedia extensions. Section J 
outlines the basics of the IDEA algorithm. Sectionjdescribes our implementa- 
tion of IDEA on MMX. Section^ describes shortly the fast parallel implemen- 
tation of IDEA. Section^takes a more broad view of multimedia architectures 
and Sect.Jgives a short description of “why can’t most of the block ciphers be 
parallelized on the MMX” and raises the problem of designing new, multimedia- 
like constituted block ciphers. In Sect.^we outline the results and finally, Sect.| 
acknowledges the people who have to be acknowledged. 

2 Introduction to MMX 

At the time of writing this paper Intel’s Pentium was the most widely used gen- 
eral purpose processor. We shall not present a d etailed o utline of Intel Pentium’s 
architecture (an interested reader may turn to or 

MMX (MultiMedia extensions) is a relatively new technology to enhance 
performance of advanced media and communication applications. The MMX 
technology introduces new general-purpose instructions that operate in parallel 
on multiple data elements packed into 64-bit quantities (the ‘SWAR’ — SIMD 
Within A Register — architecture, These instructions accelerate the 

performance of multimedia applications such as motion video, combined graph- 
ics with video, image processing, audio synthesis, speech synthesis and com- 
pression, telephony, video conferencing, 2D graphics, and 3D graphics. These 
applications were broken down to identify the most compute-intensive routines, 
which were then analyzed in detail using advanced computer-aided engineering 
tools. The results of this extensive analysis showed many common, fundamen- 
tal characteristics across these diverse software categories. The key attributes of 
these applications were: 

— Small integer data types (for example: 8-bit graphics pixels, 16-bit audio 
samples) . 

— Small, highly repetitive loops. 

— Frequent multiplies and accumulates. 

— Compute-intensive algorithms. 

— Highly parallel operations. 

The new MMX instructions work on 8 new 64-bit registers called 7oinmO . . . 
7,mm7. Some of the instructions have an 8- way parallel 8-bit, a 4- way parallel 16- 
bit, a 2-way parallel 32-bit and a 64-bit version but most of the operations (like 
multiplication and addition) have only versions corresponding to some subset of 
these possibilities. There are more operations for 8-bit and 16-bit data than for 
larger data types (the “small data types” paradigm) . 
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All microprocessors in the Pentium family have another level of parallelism, 
called super-scalar parallelism. In particular, most of the MMX instructions can 
be executed in both U and V pipelines (in parallel with any other instruction), 
with the following exceptions. 

— Multiplication requires three cycles (has latency 3) but can be pipelined, 
resulting in one multiplication operation every clock cycle (has throughput 
1 ) . Multiplication instructions cannot pair with other multiplication instruc- 
tions. 

— Shift, pack and unpack instructions cannot pair with each other. 

— MMX instructions that access memory or integer registers can only execute 
in the [/-pipe and cannot be paired with any instructions that are not MMX 
instructions. 

— After updating an MMX register, one additional clock cycle must pass before 
that MMX register can be moved to either memory or to an integer register. 

Throughput is 1 for every operation, latency is 1 for every operation but mul- 
tiplication. It is important to understand the difference between the SIMD- 
parallelism provided by the MMX technology and the super-scalar parallelism. 
The first permits to execute the same operation on up to eight different data 
entities as one instruction, the second makes it possible to execute two possibly 
different instructions during the same machine cycle. Hence, the total level of 
parallelism inside a Pentium MMX can be up to 16. 

Still, most of the applications do not benefit from MMX. Some of the lim- 
itations of MM X (and the Pentium family in general) are outlined below (cf 
for more information): 

Maximum two operands. Pentium/MMX instructions have the maximum 
of two operands, causing a high frequency of the move (movq) instructions in 
Pentium/MMX programs. 

Lack of registers. There are only 8 MMX registers, which is rather insufficient 
for most of the compute-intensive applications. 

Slow interaction with integer registers and memory. Data in memory 
has to be aligned to 64-bit boundaries (misalignment costs three cycles on the 
Pentium processor family) and arranged in a way that minimizes the number of 
cache misses. Correct data alignment may significantly expand the data struc- 
tures (in the worst case, expanded data will not fit into the cache). The delay 
for a cache miss is at least eight internal clock cycles. Pairing limitations were 
already mentioned. 

Limited number of instructions. MMX has only a limited set of specific 
operations. Because of the slow interaction between integer and MMX register 
sets, small programs using intensively both integer and MMX instructions will 
generally not benefit from MMX. 
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No flags register. The MMX command set does not change the flags regis- 
ter and therefore the wide variety of branch instructions available on the Pen- 
tium is not useable. The only two comparison operators on MMX (pcmpgt* and 
pcmpeq*; greater than, equal to) act on signed data and change the correspond- 
ing bits of the destination register to 1 (true) or 0 (false). Emulating different 
— especially unsigned — comparisons takes additional time. 



No commands with immediate operands. Immediate operands have to be 
loaded from memory or generated by other means (e.g., by xoring or comparing 
a register to itself). 



Only 16-bit signed multiplication. Applications intensively using the un- 
signed multiplication may become significantly slower. IDEA multiplication © 
(Section^, which is expensive to emulate using unsigned multiplication is even 
more expensive to emulate using only the signed multiplication (see Section^. 
Emulation of 0 using the available MMX instructions needs two multiplications: 
one to calculate the higher 16 bits of the result (pmulhw) and another to calculate 
the lower 16 bits (pmullw). 



Standard reference for MMX optimization is 




Definition 1. Let the subscript s (resp. u) under a binary operator denote 
signedness (resp. unsignedness) of the corresponding operation. Let and 
be respectively the signed and unsigned multiplication operations from 7Z\\f, to 
TZi-ii (*u is the standard multiplication, expandable to ^^ 32 ). Let True(4>) be 
2^®— 1 if (j) is true and 0 otherwise. Next we define several basic operators corre- 
sponding one-to-one to the instructions of MMX. Actually the correspondence is 
4-way, he., the MMX instructions execute four such operations in parallel. Let 



Cmpeq{a, b) 


= True{a = b) 


Cmpgt{a, b) 


= True{a >s b) 


a © 6 


= a bitwise “xor” in ^ 2^0 


a h b 


= a bitwise “and” in ^ 2^0 


a B 6 


= a — b mod 2^® 


a ffl 6 


= 0 + 5 mod 2^® 


Subus{a, b) 


= (o B 5) & True{a >„ 5) 


Mull{a, b) 


= (o*«5) & (2^®- 1) 


Mulh{a, b) 


= L(a*«5)/2i6j . 



3 Introduction to IDEA 



IDEA is — like most of the advanced block ciphers — an iterated cipher. IDEA 
consists of 8 identical rounds that map the 4-tuple of 16-bit round input, (Xf )f^i, 
and the 6-tuple of 16-bit round subkeys (expanded from the 128-bit key 
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using the key expansion algorithm) into the After eight rounds the 

output transformation will be executed. A round consists of several applications 
of three group operations, whole IDEA can be presented as a directed labeled 
graph with labels from the set {©, ffl, ©} (Figure 

Technically, let d : .S ’216 ^ .S' 2 i 6 _|_i, d{x) = 2^® if a; = 0 and d{x) = x 
otherwise. The group operations used in IDEA are a © 6, a ffl 5 and aQb, where 

a Q b := d~^(d(a) ■ d(b) mod 2^® + I). 

In particular, these operations were chosen for no two of them to be distributive 
or associative to each other This fact guarantees that all operations 

in the IDEA schematics must be executed in an order not contrary to the data 
dependencies. Operations not dependent on each other’s output can be executed 
in parallel: in parallel with MJ, and E\ with E^ with E^. On a 

SIMD architecture where only similar operations can be executed simultaneously, 
and cannot be performed in the same instruction as A[ and A^. 

IDEA satisfies most of the key attributes of multimedia applications used by 
designing MMX, therefore being an almost ideal candidate cipher to get benefit 
from MMX: 

— IDEA has small integer data types (all the operations work on 16-bit data). 
Having only small data values enables to pack several of them into one 
register and thereafter process multiple plaintext blocks in parallel (one of 
the main factors in effective parallelization) . 

— IDEA processes the same data over and over without requiring random mem- 
ory accesses, therefore needing less interaction with the slow memory. Addi- 
tionally, IDEA lacks operations necessitating expensive, non-parallelizable, 
table lookups (another main factor in effective parallelization). 

— IDEA is based on two 16-bit operations that are common in multimedia 
applications (16-bit multiplication and addition) and on exclusive or that 
is a primitive instruction in almost every microprocessor. Although IDEA’S 
multiplication is not trivial to implement on MMX, MMX still provides some 
speedup (compared to the Pentium) per every multiplication (an important 
factor to get an overall speedup) . 

4 Fast Implementation 

We have addressed all problems mentioned above and completed a fast imple- 
mentation of IDEA on a Pentium MMX. Some of the tasks we had to solve 
are outlined below. We assume the plaintext to be in an MMX register and the 
pointer to the key schedule in an integer register. The ciphertext can be read 
afterwards from the same MMX register. 

General optimization. Optimal use of registers, with minimized number of 
move instructions. Minimized use of memory: only constants and subkeys are 
read from memory. Subkeys and constants are correctly aligned to avoid time 
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penalties due to data misalignment (the key schedule has therefore expanded 
from 104 to 136 bytes). Data in memory is kept compactly and reused to reduce 
the number of cache misses. From the integer registers only one is used (as a 
pointer to the round subkeys) . Nothing is written from MMX registers to memory 
or integer registers. 

Effective use of super-scalar parallelism. In our implementation, 693 in- 
structions are paired into 358 cycles. Excellent pairing (0.517 cycles per instruc- 
tion) is a little miracle (cf to > 0.56 cycles per instruction got by Bosselaers 
when implementing hash functions for the Pentium, ^2^3) and is definitely 
one of the sources of the effectiveness of our implementation. 

Use of SIMD parallelism. and (resp, A\ and A 2 , . . . ) are calculated 
in parallel by using the SWAR capability of MMX processors. This is another 
main factor in increasing the speed of IDEA. 

Emulation of ©, using the available MMX instructions, is done, as we believe, 
optimally. In the following we shortly explain how. 

Lemma 1. 

a*ub =2^® • {Mulh{a, b) + {a k Cmpgt(0, b)) + {b k Cmpgt{0, a))) + 
Mull{a,b) . 



Proof: 

a, b >s 0. In this case a*sb = a*ub. 

a >s 0, 0 >s b. In this case, a is a positive and 6 is a negative number. Thus, 
a *s b = a *u {b — 2^®) = a 5 — 2^® a. 

6 >s 0, 0 >s a. Complementary to the previous case. 

0 >s a, b. In this case, a*gb — a*ub — 2^® *uO, — 2^® *ub— 2^^ = a 6 — 2^® 
a-2i6*„6. 

Results got by analyzing the four cases can be generalized by simple means to 
complete the proof. ■ 

As already mentioned, MMX lacks unsigned comparison instructions. Our 
implementation needs one of them, which will be emulated using existing in- 
structions. Let Cmpleu(a, 6) = True(a <„ b). It is easy to see that 

Cmpleu(a, b) = Cmpeq(Subus(a, 6), 0) . 

Lemma 2. Let a, b G .S' 216 . Let h := {a b)j2^^ and I := (a b) & (2^® — 1) 
be calculated by the previous lemma. Then 



aQb =((1 B a B 6) & Cmpeq{h, 1)) B 

{{imiBhmCrnpleu{hJ)) & (Cmpeq(/i, 0 © (2^® - 1))) . 
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Proof: The claim follows easily from Lemma 2 of by noticing that h = I 

iff a 6 = 0. ■ 

These formulas give a direct way to break down the © operation into ba- 
sic operations, corresponding one-to-one to MMX instructions. For example, 
Cmpeq(/i, /) corresponds to the instruction pcmpeqw, Cmpgt(/i, /) to pcmpgtw, 
Subus(a, 6) to psubusw, Mull(a, 5) to pmullw, Mulh(a, 6) to pmulhw. The given 
formula for emulation of 16-bit unsigned multiplication is, as far as we know, 
faster than any previously published algorithm for MMX and therefore interest- 
ing in itself. 

Including also the necessary move instructions, the minimal number of MMX 
instructions needed to emulate 0 by the procedure given above is 26. Additional 
highly processor (and algorithm) dependent mechanisms enable to get rid of 
three more instructions per IDEA multiplication, therefore resulting in 69 in- 
structions per round (remember, we do M{ and MJ in parallel) . Everything else 
(e.g., parallel adding, xoring) is accomplished in 13 instructions, hence a round 
takes 82 instructions. The output transform takes 29 instructions and the nec- 
essary endianness conversion takes 8 instructions, therefore the full IDEA has 
693 instructions. After pairing, IDEA encryption takes 358 clock cycles or 29.7 
Mbits/s on a 166MHz MMX. Note that our implementations are not subject to 
timing attacks due to the lack of jump instructions and any variable 

duration instructions. 

Remark 1. Schneier and Whiting have conjectured that there exists an 

IDEA implementation for the Pentium with « 400 cycles, which is unrealistic for 
the next reason. Every round has four sequential emulations of 0. The critical 
path of the © operation contains integer multiplication (with latency 9) and 
at least 6 other instructions (moving one of the operands into the accumulator 
and afterwards converting the result of to the result of ©) that cannot be 
paired with each other, therefore the multiplications take at least 60 cycles per 
round. The XOR and addition operators present in the IDEA schematics cannot 
be paired with emulations of 0 and therefore take additional time. Adding the 
output transform and endianness conversion, there seems to be no obvious way 
to significantly better the implementation of Bosselaers. 



5 Parallel Execution of Four IDEA-s 

In MMX, four 16-bit multiplications are executed simultaneously. The same is 
true for every other instruction used to emulate ©. During each round, three 
such 4-way multiplications are done, giving a 64-bit result, only a part of which 
is really used in the implementation described in Section^ (Figure H left): 

First multiplication (Mf, MJ). The first (bits 0 . . . 15) and the fourth (bits 
48 . . . 63) word of the result are used. (This multiplication is also done during 
the output transformation.) 

Second multiplication (Mg). The second word (bits 16 . . . 31) of the result 
is used. 
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Third multiplication {Ml). The third word (bits 32 . . . 47) of the result is 
used. 



63 48 32 16 0 63 48 32 16 0 




Output transformation 



Fig. 2. Using of SWAR data during IDEA (unused fields are left blank). Left: 
one IDEA in parallel, right: four IDEA’S in parallel. 



We extended our implementation to encrypt four blocks in parallel, by fully 
using the results of all four multiplications at every step (Fig. B right. Note 
that such implementation will require unparallelizing the execution of M[ and 
M 2 ). Two 4x4 matrix transpositions (to (un)parallelize four 64-bit blocks), 
additional endianness conversions and extensive memory access (due to the lack 
of registers) will “slow” the implementation down to « 135 cycles per IDEA 
encryption. A not-fully optimized implementation encrypts one IDEA block in 
135.75 cycles (543 cycles/1056 instructions for 4-way IDEA), Tabled This scales 
up to about 212 Mbits/s on a 450 MHz Pentium II, compared to the 300 Mbits/s 
of the fastest (known) hardware solution by Ascom. 



Cipher 


IDEA 


4-way IDEA 


166 MHz MMX, seconds 
166 MHz MMX, Mbits/s 
MMX, cycles (with overhead) 
MMX, cycles (w/o overhead) 


8.97-9.07 
28.2 - 28.5 
372 - 376 
358 


3.53-3.56 
71.8-72.4 
147 - 148 
135.75 


233 MHz Pentium II, seconds 
233 MHz Pentium II, Mbits/s 
Pentium II, cycles (with overhead) 


7.78 - 7.96 
32.2 - 32.9 
453 - 464 


2.38-2.43 
105.1 - 107.2 
139 - 142 



Table 2. Test data. The “real life” throughput of IDEA-ECB on the Pentium 
MMX and on the Pentium II. Seconds - the time to encrypt four million 64-bit 
blocks. 
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6 Different Multimedia Extensions 



If MMX had the unsigned multiplication instruction, the number of instructions 
per IDEA multiplication would decrease by 6. If MMX had the unsigned compar- 
ison instruction pcmpgtuw, the number of instructions per IDEA multiplication 
would decrease by 2. In the presence of both of these instructions, IDEA en- 
cryption on MMX machines could be done much faster than DES (we estimate 
« 250 — 255 cycles); 4- way IDEA would be faster than Square or any 

of the recently proposed AES candidate ciphers (we estimate w 95 — 100 cycles). 
Conditional move instructions, present in the Cyrix’s — but not in the Intel’s 
— version of MMX, would further speed up IDEA. If even such imperceptible 
changes fastened up a cipher significantly, what about the multimedia extensions 
that differ from MMX in major aspects? 

Lately, in May 1998, Motorola unveiled their new multimedia architecture 
called AltiVec claimed to be much more powerful than any of the pre- 

viously mentioned architectures. In particular, AltiVec has increased parallelism 
(128-bit vector registers) and a family of instructions to perform up to eight 
16-bit (un)signed multiplications (with accumulate) in parallel. Additionally, Al- 
tiVec has a special inter-element byte permutation instruction and several vector 
rotation instructions and therefore allows to implement new fast ciphers using 
data-dependent rotations and byte permutations. One of the goals of AltiVec 
(unlike the MMX) was to accelerate data encryption algorithms page 

1-4]. A short comparison between MMX and AltiVec is given in Tabled 



Architecture 


MMX 


AltiVec 


Company 


Intel 


Motorola 


Year 


1997 


1999 


Endianness 


little 


both 


Max no of operands 


2 


4 


H (vector registers) 


8 (FP) 


32 (separate) 


Width of vector registers 


64 


128 


8-bit parallelism 


8 


16 


16-bit parallelism 


4 


8 


32-bit parallelism 


2 


4 


16 X 16-bit signed multiplication 


Yes 


Yes 


16 X 16-bit unsigned multiplication 


No 


Yes 


Signed comparison 


Yes 


Yes 


Unsigned comparison 


No 


Yes 


Data-dependent rotation 


No 


Yes 



Table 3. Short comparison of MMX and AltiVec. 



Vector processors provide even more parallelism. Krste Asanovic has reported 



an implementation of 32- way IDEA on a 40 MHz TO | 



I reaching 112 Mbits/s. 



No “industry-standard” block cipher has that level of inner parallelism. 
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7 Block Cipher Parallelization 

Ciphers using S'-boxes and/or lookup tables (e.g., DES, alleged RC4, SEAL, 
Blowfish, Khufu) do not take major advantage from the multimedia extensions of 
MMX (though they could benefit from the larger cache or word-size) as the MMX 
registers cannot be used as memory pointers. Parallelization of these ciphers 
would need accessing several “randomly” chosen memory cells simultaneously. 
RC5 which does not use 5-boxes, does not benefit from MMX either 

because of the expensive non-parallelizable variable rotation involved. 

It is interesting to note that some of the newest block ciphers, including 
the AES candidates MARS and RC6 rely on the 32-bit 

unsigned multiplication. The reasoning of the authors is that such multiplication 
is very cheap on nowadays common microprocessors. This claim is indeed true, 
but MMX technology cannot be used to accelerate these ciphers (and neither 
can AltiVec) because of the lack of a 32-bit parallel multiplication. There is a 
certain tradeoff (and even a contradiction) here. MARS and RC6 are optimized 
for the new 32-bit processors (mainly for the Pentium II), utilizing fully the 
32-bit operations provided by such processors. At the same time, these ciphers 
ignore the multimedia extensions existing in the very same processors. 

Further work can be done in trying to optimize different conventional ciphers 
for the Pentium MMX, but as it was pointed out, most of the commonly known 
block ciphers do not benefit from MMX. Still, in some cases interleaving Pentium 
integer and MMX instructions may result in some speedup. In particular, bit- 
slice MMX implementations of different block ciphers should be more than twice 
as fast because of the longer wordsize and additional logical operations. 

One could think that MMX was designed “especially” to accelerate IDEA, 
but it would be more correct to say that IDEA is a cipher with key attributes 
very similar to those of multimedia applications (cf Sect.H, by a loose definition 
of multimedia applications as applications benefiting from the Pentium MMX 
(different vendors have optimized their processors to be optimal for different 
subsets of multimedia applications) . 

A family of new block ciphers can be designed to take full advantage of 
MMX. A straightforward way would be to iteratively execute four copies of the 
IDEA round function in parallel and then mix their outputs in a suitable way. 
Would it be sufficient to apply a well chosen 8/ 16-bit word permutation to the 
256-bit output of every round of this 4-way IDEA to get a secure cipher? A way 
providing more effi cient diffusion would be to use Pseudo-Hadamard Transforms 
Further research in this area is deferred to a future work. An 
interested reader may turn to where parallelized versions of the stream 

cipher Wake were proposed. 

A more general task is to study design principles of secure ciphers based 
on the same basic operations (e.g., massively parallel 16-bit multiplication and 
addition of sequential data) as the existing multimedia applications. Such ciphers 
would perform well on nowadays microprocessors, therefore reducing the need 
for separate encryption and multimedia hardware (it can be compared to the 
approach of that uses the same hardware for RSA and IDEA) . Efficient 



260 Helger Lipmaa 



confusion on such ciphers may be achieved by using 16-bit multiplication mixed 
with other 8-bit and 16-bit operations; diffusion may be achieved by additionally 
using 32-bit and 64-bit operations (e.g., shifts — but remember the “small data 
type” paradigm). 

Yet another task is to study encryption modes allowing fast parallel encryp- 
tion and decryption. The ECB mode can be used for both parallel encryption 
and decryption, but it has limited security in real life situations. The CBC mode 
can be used for parallel decryption but not for parallel encryption. The resulting 
throughput of IDEA encryption on a 233 MHz Pentium II would be 32 — 33 
Mbits/s for encryption and 105 — 107 Mbits/s for decryption in standard CBC 
mode (Table H. Encryption modes allowing both fast parallel encryption and 
decryption are needed. Note that such encryption modes are not only impor- 
tant for software but also for hardware architectures. The hardware solution 
mentioned before provides a throughput of 300 Mbits/sec in ECB mode, and a 
throughput of 100 Mbits/sec in the other modes. An example candidate is the 



I Sect. 7.2.2] which allows parallel encryption/decryption 

but 



counter mode 

while providing almost ideal security in the random oracle model 
which is not suited for use with differentially weak ciphers 

One could see the problem also from the viewpoint of a processor designer 
and ask what (minimal) extensions should be added to an existing general- 
purpose processor to achieve significant speedup of industry-standard crypto- 
graphic primitives. While the general answer seems to be out of our reach due to 
the diversity of cryptographic primitives, suggestions can be given to accelerate 
any fixed primitive (see discussion in the beginning of Sect.^. 



8 Conclusion 

We have shown that it is possible to speed up the IDEA block cipher significantly 
by using the MMX extensions of Intel’s Pentium processor. This is remarkable 
when taking into account the unfriendliness of the instruction set of MMX. Our 
fast implementation is 

— 1.65 times faster than the best known assembler implementation on the 
Pentium by Antoon Bosselaers, 

— ss 2.55 times faster than the C version on the Pentium in the popular library 
SSLeay vO . 90b, when compiled with egcs 1.0.2 and full optimization. 

By parallelizing four IDEA’S, the encryption speed is increased by a factor of 
about 2.64 times, giving a total acceleration of 4.35 times compared to the imple- 
mentation of Bosselaers. Implications (including the massive parallel key search) 
of using such parallel versions of conventional ciphers were already described in 
and were not repeated in this paper. 

By noting that most of the nowadays “industry-standard” block ciphers do 
not benefit from MMX, we raise the problem of designing block ciphers (and 
encryption modes) fully utilizing the basic operations of multimedia. 
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Abstract. In this paper, we study a strategy for constructing fast and 
practically secure round functions that yield sufficiently small values of 
the maximum differential and linear probabilities p, q. We consider mn- 
bit round functions with 2-round SPN structure for Feistel ciphers. 

In this strategy, we regard a linear transformation layer as an n x n 
matrix P over {0,1}. We describe the relationship between the matrix 
representation and the actual construction of the linear transformation 
layer. We propose a search algorithm for constructing the optimal lin- 
ear transformation layer by using the matrix representation in order to 
minimize probabilities p, q as much possible. Furthermore, by this al- 
gorithm, we determine the optimal linear transformation layer that pro- 
vides p < Pa, q < qa la the case of n = 8, where Pa, qa denote the 
maximum differential and linear probabilities of s-box. 



1 Introduction 

1.1 Background 

Differential cryptanalysis (DC) Q proposed by Biham and Shamir and linear 
cryptanalysis (LC) ^3 proposed by Matsui are the most powerful approaches 
to attacking most symmetric block ciphers. Accordingly, the designer should 
evaluate the security of any new proposed symmetric cipher against DC/LC and 
prove that it is sufficient invulnerable against them. It is known that there are 
four measures in order to evaluate the security of a cipher against DC/LC. 

Precise measure The maximum average of differential and linear probabili- 
ties. They are also called differential probability ^3 and approximate linear 
hull ^ 3 . 
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Lai et al. and Nyberg stated that the precise evaluation of the secu- 
rity of cipher against DC/LC should be done using this measure. Generally 
speaking, however, since it is infeasible to evaluate these probabilities, this 
measure is not practical. 

Theoretical measure The upper bounds of the maximum average of differ- 
ential and linear probabilities. This measure was applied to evaluate the 
security of MISTY and the cipher /CAf 

These probabilities are evaluated as 2p^, 2q^ for an i?-round Feistel cipher 
{R > 4), respectively Q, where p, q is the maximum differential and lin- 
ear probabilities of the round function. Furthermore, for an i?-round Feistel 
cipher with bijective round function (i? > 3), they are evaluated as 
respectively Nyberg and Knudsen ^9 stated that Feistel ciphers evalu- 
ated with this measure are provably secure against DC/LC, which means 
that they are theoretically invulnerable to DC/LC, if these probabilities are 
sufficiently small. However, when designers intend to construct provably se- 
cure ciphers, they have to construct round functions yielding extremely small 
values of the probabilities p, q. This is a very strong constraint on the design 
of round functions. 

Heuristic measure The maximum differential and linear characteristic prob- 
abilities. This measure was applied to estimate the security of DES 
FEAL and so on. 

Biham and Shamir Q claimed that the larger differential characteristic prob- 
ability is, the higher is the success rate of DC, since they exploited a single 
path between plaintexts and ciphertexts which holds with significant differ- 
ential characteristic probability. Matsui claimed the same for LC. 

However, these probabilities only give the lower bounds of the maximum 
average of differential and linear probabilities for some ciphers, since there 
are multiple paths between the same plaintexts and ciphertexts in prac- 
tice ^3^3- Furthermore, it takes much time to estimate them, since this 
involves the use of path searching algorithms, e.g., 

Practical measure The upper bounds of the maximum differential and linear 
characteristic probabilities. 

As shown Q and Q in Sect. ^3 below, given the maximum differential 
and linear probabilities of round function p, q, these probabilities are eval- 
uated to be decreasing functions with the number of rounds of a Feistel 
cipher. Knudsen noted that Feistel ciphers evaluated with this measure 
are practically secure against DC/LC, i.e., that they were believed to be 
invulnerable against DC/LC if these probabilities are less than a secure cri- 
terion, e.g., 2“®^ for 64-bit ciphers and 2“^^® for 128-bit ciphers. This implies 
that designers can construct practically secure Feistel ciphers that consist 
of good round functions while also considering their invulnerability to other 
known attacks and implementation efficiency with a moderate number of 
rounds. 

Heys and Tavares studied the upper bounds of the maximum differential 
and linear characteristic probabilities of traditional SPN Q. They focused 
on diffusion layers constructed by one bit operations. However, we consider 
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that the SPN structures with diffusion layers constructed around word oper- 
ations, e.g., 8-bit length, should be studied, since bit operations are hard to 
implement in software. Of interest, Rijmen et al. introduced the branch 
number B for the SPN cipher, which is a lower bound of the number of active 
s-boxes in two consecutive rounds of a non-trivial linear trail or a non-trivial 
differential characteristic. The branch number S is a very similar in concept 
to these probabilities, since the security of an SPN cipher against DC/LC is 
evaluated by piling up the branch numbers every two rounds. 

Of course, new ciphers have to be invulnerable against other known attacks; 
e.g., higher order differential attack and interpolation attack Un- 

fortunately, all the above measures evaluate the security against only DC/LC. 
That is, these evaluations do not prove that the cipher is also secure against other 
known attacks. For example, the cipher /CAf with 6-round, which was evaluated 
by the theoretical measure, was broken by higher order differential attack Q, 
even though the maximum average of differential and linear probabilities is less 
than 2“®'^, which means that the cipher is provably secure against DC/LC Q. 

Moreover, it is known empirically that, for good round functions, increasing 
the number of rounds will make the cipher more secure against most known 
attacks. Therefore, we consider that the number of rounds is more important in 
order to construct a secure cipher than the values of the maximum differential 
and linear probabilities of round function p, q. Thus, we believe that the practical 
measure is more useful that the theoretical measure evaluating the security of a 
cipher against DC/LC. 



1.2 Design 

Later we discuss the round functions of Feistel ciphers that are practically secure 
against DC/LC. Our design strategy is as follows: 

(a) To enable the maximum differential and linear probabilities of round func- 
tion to be evaluated, and to prove our cipher to be sufficiently invulnerable 
against DC/LC. 

(b) To realize a fast round function. 

(c) To enable it to be efficiently implemented on multiple platforms, e.g., 8-bit, 
16-bit, 32-bit, and 64-bit processors. 

As mentioned above, we believe that the number of rounds is more important 
in constructing secure ciphers than the values of the maximum differential and 
linear probabilities of round function p, q. Unfortunately, though increasing the 
number of rounds makes the cipher more secure, it also slows the encryption 
speed. This means that it is very difficult to construct a fast cipher that has 
a sufficient number of rounds, even if the round function is fast. Thus, we be- 
lieve that our cipher should have a reasonable number of rounds and suitable 
differential and linear probabilities of round function p, q. 

Furthermore, to satisfy strategy (c), we intend to use s-boxes as non-linear 
transformations, since the approach of realizing s-boxes by table look-up has 
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no limitation with respect to the selection of function, software instructions of 
arithmetic operations depend on architectures, etc. 

Thus, in this paper, we consider an mn-bit round function with m-bit s- 
boxejto construct a fast and practically secure Feistel cipher. It is known that, 
for Feistel ciphers, major current round functions with s-boxes are based on 
(A) 1 -round SPN structure, or (B) Recursive structure. Examples of former are 
DES B, CAST □ and LOKI91 B; an example of the latter is MISTY Q. 

Next, we focus on the total number of s-boxes in a cipher, which means how 
many s-boxes the cipher contains, not how many different s-boxes there are in 
the cipher. Roughly speaking, the encryption time is proportional to the total 
number of s-boxes under the assumptions that the run time except s-box process 
is negligible and that parallel processing is left out of consideration. That is, in 
order to construct a fast cipher, it is desirable that the total number of s-boxes in 
it be as small as possible. On the other hand, in order to construct a practically 
secure cipher, we must ensure that there are as many number of active s-boxes 
in it as the designer specified. 

(A) 1-round SPN Structure This structure has one non-linear transforma- 
tion layer with n parallel m-bit s-boxes and one linear transformation (per- 
mutation) layer. 

Since the minimum number of active s-boxes in this structure is 1, the max- 
imum differential and linear probabilities of round function p, q are equal 
to the maximum differential and linear probabilities of s-box Ps, Qs', i.e., 
p — Ps, (? = (?s- It is known that ps, qs are equal to or larger than 2“"*+^ 
when m is odd or 2“"*+^ when m is even B- This means that too many 
rounds are required in order to construct a practically secure Feistel cipher; 
e.g., 33 rounds are required for a 128-bit Feistel cipher with a 64-bit bijec- 
tive round function using 8-bit s-boxes whose the maximum differential and 
linear probabilities are P s = qs = 2“®. 

By the way, Nyberg ^3 investigated the upper bounds of the maximum dif- 
ferential and linear probabilities of a generalized Feistel cipher with 1-round 
SPN structure, where “generalized” means “generalized Feistel structure,” 
not “generalized linear layer.” That is, unlike our research, the structure of 
the round function is not discussed. Furthermore, it only showed that the 
upper bounds of the maximum differential and linear probabilities of a cipher 
are of order Qg” in the case of n < 4. 

(B) Recursive Structure The recursive structure consists of nested functions 
with different input bit lengths. 

For example, consider the 3-round recursive structure. In this case, since the 
number of s-boxes triples with each recursion, it is difficult to construct fast 
round functions because the number of s-boxes is too large. Furthermore, in- 
creasing the number of rounds rapidly increases the total number of s-boxes. 
Generally speaking, though the maximum differential and linear probabili- 
ties of round function p, q are extremely small, the number of rounds must 
be at least 8 in order to provide invulnerability against other known attacks. 

^ In this paper, we do not consider expand permutations in round functions. 
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We consider that too many s-boxes would be required in a practically secure 
cipher using 1-round SPN structure, since the maximum differential and linear 
probabilities of round function p, q are not sufficiently small. On the other 
hand, we also consider that it is difficult to increase the number of rounds of a 
cipher that use the recursive structure, since the number of s-boxes in the round 
function is too large. Accordingly, we need a new round function structure that 
makes the probabilities p, q much smaller than those achieved with the 1-round 
SPN structure and that requires a lot fewer s-boxes than the recursive structure. 



1.3 2-round SPN Structure Approach 

Our strategy is based on using mn-bit round functions consisting of three-layers 
(see Fig.H: “1st non-linear transformation layer with n parallel m-bit s-boxes, 
linear transformation layer, and 2nd non-linear transformation layer with n par- 
allel m-bit s-boxes” in this order. We call this the 2-round SPN structure 
hereafter. Each layer has the following feature. 




2nd nonlinear Linear 1st nonlinear 

transformation layer transformation layer transformation layer 



Fig. 1. 2-round SPN structural round function 



1st non-linear transformation layer n parallel m-bit s-boxes are set in DES- 
like manner. In the following, we call this “1st non-linear layer.” 

2nd non-linear transformation layer n parallel m-bit s-boxes are set simi- 
lar to the 1st non-linear layer. With the addition of this layer, the maximum 
differential and linear probabilities of round function become much smaller. 
In the following, we call this “2nd non-linear layer.” 

Linear transformation layer This makes the maximum differential and lin- 
ear probabilities as small as possible for all non-zero input differences and 
all non-zero output mask values of round functioi| Here, we consider that 
this is constructed only with bitwise XORs. That is, inputs are transformed 

^ Vaudenay proposed to use multipermutation as diffusion (permutation) Mul- 
tipermutation is a good cryptographic primitive, but, it is very hard to satisfy the 
multipermutation requirements, and to make multipermutation with large bit size. 
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linearly to outputs per m-bits, especially per byte in the case of m = 8. In 
the following, we call this “linear layer.” 

By the way, all round subkeys are bitwise XORed with data before each s- 
box in order to form a Markov cipher Accordingly, we take no account of 
key addition layer here. 

1.4 Result 

For the 2-round SPN structure, an interesting issue is how to construct the 
optimal linear layer, since the two non-linear layers are already set. To construct 
the optimal linear layer, we represent a linear layer as an n x n matrix P over 
{0, 1}, and describe how to determine the matrix elements in order to minimize 
the differential and linear probabilities of round function p, q as much as possible. 

In this paper, we propose a new search algorithm based on matrix representa- 
tion for constructing the optimal linear layer. Furthermore, with this algorithm, 
we determine the optimal linear transformation layer that provides probabilities 
P ^ Psj 9 ^ 9s where n = 8, where Ps, 9s are the maximum differential and 
linear probabilities of the m-bit s-boxes in the non-linear layers. 

As shown in Q and Q in Sect. below, the number of rounds must be 
dependent on the maximum differential and linear probabilities of round function 
p, q to construct a practically secure cipher. We show that the round function 
with the 2-round SPN structure requires one-fourth as many rounds as the 1- 
round SPN structure to achieve the same differential and linear probabilities, 
while each round has double the number of s-boxes. Hence, the round function 
using the 2-round SPN structure is twice as efficient as that using the 1-round 
SPN structure. 

1.5 Organization 

This paper is organized as follows. In Sect. 2, we introduce some preliminary 
definitions. In Sect. 3, we describe the relationship between the matrix represen- 
tation and the actual construction of the linear transformation layer. We propose 
a new search algorithm for constructing the optimal linear layer. In Sect. 4, we 
show the optimal linear layer for the case of n = 8. Finally, we conclude with a 
summary in Sect. 5. 

2 Preliminaries 

2.1 Notations 

— #{5} : number of elements in set S. 

— Ax, Px : difference of x, mask value of x. 

— a* b : parity of bitwise product a and b. 

— P = [tij], tij G {0, 1}, 0 < z, j < n : matrix representation of a linear layer 

— z = '^[zo, ■ ■ ■, Zn-i], Zi G GF(2)'", 0 < z < n : an input to a linear layer 

— z' = ^ [z'q, . . . , = Pz, z[ G GF(2)"*, 0 < z < ZZ : an output from a 

linear layer 
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2.2 Definitions 

In this paper, we consider Feistel ciphers with mn-bit round function. We assume 
that a round key, which is used within one round, consists of independent and 
uniformly random bits and is bitwise XORed with data. Furthermore, we assume 
that an input also consists of independent and uniformly random bits and that 
all m-bit s-boxes work independently. Accordingly, we neglect the effect of the 
round key. 

We use the following definitions in this paper. 

Definition 1. For given Ax, Fx and Ay, Fy, the differential and linear prob- 
abilities of an s-box are defined as: 

DPffAx Ay) = € GF{2)^\s{x) 0 s{x 0 Ax) ^ Ay} 

ff{x e GF{2)^\x »Fx= s(x) • Fyj 
2™ 

Definition 2. The maximum differential and linear probabilities of an s-box, 
Ps, qs, are defined as: 

Ps= max DP^(Ax^Ay) 

Ax^0,Ay ^ ^ 

Ps = max LP’^iFy^Fx) 
rx,ry^o ^ ^ ' 

Definition 3. For given Ax, Fx and Ay, Fy, differential and linear probabil- 
ities of a round function with the 2-round SPN structure are defined as: 



p{Ax — 


n—1 

Ay) — max DP^{Axi - 


Azi)p{Az - 


Az[)DPffAz[ - 


Ayt) 


q{ry - 


n—1 

^ Fx) = max LP'^ [Hyi - 

rz' i=o 


^ Fz[)p{Fz' - 


Fz,)LPffFzi - 


^ PXi) 



where Ax denotes {Axq, . . ., Axn-i)- Also, Ay, Fx, and Fy are denoted in the 
same way as Ax. Az, Fz denote the input difference and mask value of the 
linear layer, and yield (Azq, . . . , Azn-i), (Fzq, ■ ■ ■ , Fzn-i), respectively. Simi- 
larly, Az' , F z' denote the output difference and mask value of the linear layer, 
respectively. Here, Az transforms for each i into Az) with the linear layer, 
{GF(2)'"}” ^ GF(2)'"; i.e., for each i, 3Az) s.t. p{Az — > Az'f) = 1. Simi- 
larly, 3Fzi s.t. p{Fz' Fzi) = 1. 

Definition 4. The maximum differential and linear probabilities of round func- 
tion p, q are defined as: 

p= max p(Ax^Ay) 

Ax^o.Ay ^ 

q= max q(Fy^Fx) 
rx.ry^o ^ 
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Definition 5. Active s-box is defined as an s-box given a non-zero input dif- 
ference or a non-zero output mask value. 

Note: When an s-box is bijective, the s-box given a non-zero output difference or 
a non-zero input mask value is also an active s-box. 

Definition 6. Assume that all s-boxes are bijective. The minimum number of 
active s-boxes Ud, ni in a round function with the 2-round SPN structure for 
DC and LC are defined as: 

Ud = min [wh{Az) + wh{Az')] 

AZ^O .. ^ 

ni = min [wniTz) -\- wniTz')] 
rz'^o 

where Wh{z) denotes the Hamming weight of z, which means the number of 
non-zero subblocks from GF(2)'" of z; i.e. wh{z) = #{0 < i < n\zi ^ 0}. 

Theorem 1. Let Ud, ni denote the minimum number of active s-boxes in a 
round function for DC and LC, respectively. Then, the probabilities p, q hold for 

P < pT, 9 < C (2) 



Definition 7. For an R-round Feistel cipher, assume that Xi (0 < z < ^ + 1); 
which is an input to the i-th round function, is an independent random vari- 
able. The maximum differential and linear characteristic probabilities DCPfiax! 
LCP^ax are defined as: 



DCP^l = max TT p{Axi Axi^^ © Axi+i) 

{AXo,AXi)^0,{AXR,AXR+i)fJ^ 



LCP^^'> = 

J- mn'T 



max 



{rxx,rxo)xrxR.rxR+\)¥^o 



q{F Xi—>F © F 



Theorem 2. For a Feistel cipher with R = 2r, 2r + 1 rounds, the upper bounds 
of the max imu m differential and linear characteristic probabilities are estimated 
as follows 

DCPi^i<p\ LCPi^i<q^ (3) 

Theorem 3. For a Feistel cipher with R = 3r, 3r + 1, 3r + 2 rounds and a 
bijective round function, the upper bounds of the maximum differential and linear 
characteristic probabilities are estimated as follows: 

In case o/ i? = 3r, 3r + 1 : DCP^i < p^f LCP^l < q^^ 

In case of R=3r + 2: DCPi^i < , LCPi^i < q^^+^ 

Theorem 4. Concatenation Rules 

When “X H {Y,Z)” denotes that Xnrmches into Y and Z, i.e., X = Y = Z, 
the following relations hold. 

X = Y ® Z ^ AX = AY ® AZ, FX = FY = FZ (XOR operation) 

X H {Y, Z) ^ AX = AY = AZ, FX = FY® FZ (BRANCH operation) 
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3 Relationship between Matrix Representation and 
Structure 

We represent the linear layer as an n x n matrix P over {0, 1}. This means that 
inputs are transformed linearly to outputs per m-bits, per byte in the case of 
m — 8, and the linear layer can be constructed with just bitwise XORs. For 
example, 

n—1 

= 0{2j} 

j=0 tij=l 

where tij denotes the element of matrix P in row i and column j. 

We assume that the maximum differential and linear probabilities of the s- 
box is Ps, Qs- From Q in Theorem 1, it turns out that evaluating the upper 
bounds of the maximum differential and linear probabilities of round function 
p, q is equivalent to counting the minimum number of active s-boxes n/. 
On the other hand, the optimal linear layer leads to the optimal round function; 
i.e., the upper bounds of the probabilities are minimum. Accordingly, 

constructing the optimal linear layer is equivalent to determining the matrix 
elements P = [tij] yielding the maximum value of number rid, ni, because Ps < 
I7 Qs < 1- Here, we note that Ud, ni is obviously equal to or less than n + 1, 
i.e.,Hd, ni < n+1, from Q in Definition 6, because wh{z) < n. 

For example, we consider the 4x4 matrix Pe such as 



4 




'0 1 1 1 ' 




Zo 






1011 




Zi 


4 




1110 




Z2 


A. 




1111 




.^ 3 . 



3.1 Relationship between Matrix Representation and Features of 
Round Ibinction 

The round function using the linear layer represented as matrix Pe has the 
following features. Here, we assume that all s-boxes are bijective. 

— We obtain Zq = 0 ■ zq ® 1 ■ Zi ® 1 ■ Z2 ® I ■ Z3 = Zi (B Z2 (B Z3. Similarly, 
z'l, z' 2 , z'-j can be represented by XORs among zq, zi, Z2, Z3. Furthermore, 
the differential characteristic can be represented in the same way. 

— The matrix Pe represents which inputs of the linear layer are combined into 
each output of the linear layer. Each row corresponds to each output of the 
linear layer, and each column corresponds to each input of the linear layer, 
see Fig.H 

— Invulnerability against DC and LC can be evaluated as the minimum number 
of active s-boxes n^, ni in the round function, respectively. Ud, ni can be 
obtained with the search algorithm described in Sect.^H 

— The Hamming weight of the column vector denotes the avalanche effect. A 
large value of the Hamming weight means that the round function has a 
good avalanche effect. 
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lo'iTn'Ti 

I , • , 

1 i 0 : 1 1 
1 : 1 i 1 0 
1 h ; 1 1 



I Linear transformation 

I layer 

L b < 

Fig. 2. Relationship between matrix representation and linear layer 




As mentioned above, the preliminary features of the round function, i.e., 
invulnerability against DC/LC and the avalanche effect, are obtained by just 
matrix elements regardless of construction of linear layer. 

3.2 Determination of Matrix Elements 

Generally speaking, given matrix P, there are many constructions of the linear 
layer that correspond to matrix P. This is because matrix P denotes only the 
relationship between inputs and outputs of the linear layer, not the construction 
of the linear layer. That is, when several linear layers can be represented by the 
same matrix P, the round functions have same the features regardless of the 
constructions of the linear layers. 

Accordingly, at first, we determine matrix elements in order to provide good 
invulnerability against DC/LC and good avalanche effect, and then realize the 
optimal linear layer. 



Consideration of Differential Characteristic Based on the nature of dif- 
ferential characteristic, the matrix elements of n x n matrix P are determined 

by the following algorithm. 

Search Algorithm 

Stepl Define security threshold T {2 <T < n). 

Step2 Prepare a set of column vectors C whose Hamming weights are equal to 
or larger than T — 1 . 

Step3 Select a subset of n column vectors Pc from set C. Repeat the following 
steps3-l and step3-2 until all subsets have been checked. 

Step3-1 Compute the minimum number of active s-boxes rid for subset Pc- This 
is represented as rid{Pc)- 

Step3-2 If rid{Pc) > T, then accept the matrix consisting of subset Pc as a 
candidate matrix. 

Step4 Output matrix P and rid{P) that yields the maximum value of rid among 
the candidate matrices. 
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The security threshold T is used in order to restrict candidate subsets in Step 
2 and Step 3-2. If candidate matrices are found by the above algorithm, then it 
can be proven that the minimum number of s-boxes is equal to or larger than 
T. That is, p < ■ 

Consideration of Linear Expression Similarly, linear expression represents 
which output mask values of the linear layer are combined to yield each input 
mask value of the linear layer. The linear expression may be obtained with 
the concatenation rules in Theorem 4, when the construction of a linear layer is 
given. If each input mask value of the linear layer is represented by XORs among 
output mask values of the linear layer, the result is an other transformation 
matrix similar to that yielded by differential characteristic. 

Here, based on our study of the relationship between the matrix for differ- 
ential characteristic and the matrix for linear expression, we make the following 
two conjectures. 

Conjecture 1. Given an n x n matrix P over {0, 1} for a linear layer. The 
matrix for differential characteristic, i.e., relationship between input and 
output differences, is represented as the same matrix P, while the matrix for 
linear expression, i.e., relationship between output and input mask values, 
is represented as the transposed matrix ^P. That is, Az' = PAz, Pz = 
^PPz’. 

Conjecture 2. The minimum number of active s-boxes Ud for differential char- 
acteristic represented as matrix P is equal to the minimum number of active 
s-boxes ni for linear expression represented as the transposed matrix ^P. 

Because of Conjecture 2, the minimum number of s-boxes ni is also equal to 
or larger than T, when candidate matrices are found by the above algorithm. 
That is, q < ■ For example, matrix Pe holds the following relationship. It is 

proven that Ud = 3, n/ = 3 for matrices Pe and Pe, respectively. 



'0 1 1 1 ' 




'0 1 1 1 ' 


10 11 
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1110 




1111 


1111 




110 1 



Matrix for differential characteristic Pe Matrix for linear expression ^Pe 



3.3 Determining the Construction of Linear Layer 

In this section, we describe how to determine the construction of a linear layer 
given matrix P. 

Determination algorithm 

Step 1 Choose two rows, and one row (row a) is XORed to the other row (row 
h) (called “primitive operation” hereafter) . 




A Strategy for Constructing Fast Round Functions with Practical Security 275 



Step 2 Transform the matrix P into the unit matrix I by the primitive opera- 
tion, and count the number of the operations in order to find the transfor- 
mation order yielding the minimum number of operations. 

Step 3 Line A, which corresponds to row a, is XORed to line B, corresponds 
to row b, in reverse order of the transformation order found in Step 2. 

For example, consider the matrix Pe- First, row 1 is XORed to row 2, and 
then row 4 is XORed to row 1. Repeat until matrix Pe is transformed into the 
unit matrix I. All states of transformation are described as follows. 
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For the above transformation, the number of operations is 5. That is, the 
linear layer can be constructed with 5 XORs. 

In reverse order of the above transformation order, line 1 is XORed to line 
2, and then line 2 is XORed to line 3, and so on. Finally, we construct the linear 
layer described in Fig.^ 
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Fig. 3. Example of linear layer represented as the matrix Pe 



4 Round Function with n = 8 

In this section, we consider a round function with n = 8. Here, we assume that 
the linear layer and s-boxes are bijective, because this enables us to evaluate the 
invulnerability against DC/LC more precisely in the case of the bijective round 
function, described as in Theorem 3 (Also, shown in B). 
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4.1 Determination for Matrix Elements 

We determine an 8 x 8 matrix P (n = 8) yielding the maximum value of Ud using 
the search algorithm of Sect. ^3 this time, we set the security threshold 
P = 8, 7, ... in this order until we found candidate matrices. Furthermore, the 
following conditions were added within the algorithm. 



steps Select subset of n column vectors Pc from the set C, and compute 
rank (Pc). If rank (Pc) ^ 8, then reject subset Pc- 
Repeat the following steps until all subsets have been checked. 

Step3-1 Compute the minimum number of active s-boxes for Pc as follows. 
- Ud = min{ndi|0 < i < 9} 

— For any two columns (Columns a,b, 0 < i < 8) 

^do — 2 -t“ min ^ (t^a , ^ib) I ^ia © Uh 0} 

(a,b) 

— For any three columns (Columns a,b,c, 0 < i < 8) 
ridi = 3+ min #{(tia ; ^ic) I ^ia ® ^ib ® ^ic 7^0} 

(a,6,c) 

nd 2 = 3+ min #{(pa, Ph, tic)|Exception of (0, 0, 0), (1, 1, 1)} 

(a,b,c) 



— For any four columns (Columns a,b,c,d, 0 < i < 8) 

= 4+ min ^ib-) ^ici ^id) l^m 0 tib 0 tic © Ud ^ 0} 

(a,b,c,d) 

Exception of (0, 0, 0, 0), 

( 1 , 1 , 0 , 0 ), ( 0 , 1 , 1 , 1 ), ( 1 , 0 , 1 , 1 ) 

Exception of (0, 0, 0, 0), 

( 1 , 0 , 1 , 0 ), ( 0 , 1 , 1 , 1 ), ( 1 , 1 , 0 , 1 ) 

Exception of (0, 0, 0, 0), 

( 1 , 0 , 0 , 1 ), ( 0 , 1 , 1 , 1 ), ( 1 , 1 , 1 , 0 ) 

Exception of (0, 0, 0, 0), 

( 0 , 1 , 1 , 0 ), ( 1 , 0 , 1 , 1 ), ( 1 , 1 , 0 , 1 ) 

Exception of (0, 0, 0, 0), 

( 0 , 1 , 0 , 1 ), ( 1 , 0 , 1 , 1 ), ( 1 , 1 , 1 , 0 ) 

Exception of (0, 0, 0, 0), 

( 0 , 0 , 1 , 1 ), ( 1 , 1 , 0 , 1 ), ( 1 , 1 , 1 , 0 ) 
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Intuitively, the equations from Udo to n^g denote the minimum number of 
total active s-boxes when the number of active s-boxes in the 1st non-linear layer 
is given. 

For example, consider equation ndo- 

When there are two active s-boxes in the 1st non-linear layer, two input 
differences of the linear layer can be represented as Aza ^ 0, Azb ^ 0- Each 
output difference of the linear layer shows [Az’^ = [tia ■ Aza © tib • Azb] (0 < 
i < 8). Here, when we assume Aza = Azb as the relationship among two input 
differences, it shows \Az'^ = [{Ua®tib) ■ Aza] (0 < i < 8). By the way, an active s- 
box in the 2nd non-linear layer means an s-box in the 2nd non-linear layer whose 
input difference is non-zero, i.e., yf 0. Accordingly, the minimum number of 
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active s-boxes in the 2nd non-linear layer is min(a_;,) #{Z\z'|Z\z- ^ 0, 0 < i < 
8} = min(Q_ft) © Ub)\tia © tih 0, 0 < i < 8}. This yields equation Udo- 

Similarly, equations Hdi and Ud 2 are obtained with the relationships among 
three input differences, and the equations from and n^g are obtained using 
the relationships among four input differences of the linear layer. 

Using the above search algorithm, we find that there is no matrix with Ud > 6, 
and that there are 10080 candidate matrices with nd = 5. Accordingly, the 
invulnerability of the round function with one of the candidate matrices against 
DC is evaluated as p < p^. Conversely, its invulnerability against LC is evaluated 
as q < Qg because of Conjecture 2 in Sect.^H 

Furthermore, for all 10080 candidate matrices, the total Hamming weight is 
equal to 44; each matrix consists of 4 column (row) vectors with the Hamming 
weight of six and 4 other column (row) vectors with the Hamming weight of five. 

4.2 Construction of Linear Layer 

Next, from the above 10080 candidate matrices, we construct the linear layer by 
the determination algorithm in Sect.^H Since, however, the computation com- 
plexity of determining the construction is very large, there are (8 x 7)^® w 2®^ 
candidates when the linear layer consists of 16 XORs, it is impossible to deter- 
mine the construction exhaustively. Accordingly, we consider a linear layer that 
consists of four blocks with 8 m-bit inputs and 4 m-bit outputs, see Fig.^A). 
Each block consists of 4 XORs, and all inputs pass only one XOR, described in 
Fig.H^B). The linear layer consists of 16 XORs, and the computation complexity 
is much lower with about (4 x 3 x 2 x 1)^ « 2^® candidates. 




(A) 



Fig. 4. Candidate construction of linear layer 



This restricted search determines that 57 linear layers can be constructed 
from among the 10080 candidate matrices. 

One of these matrices is shown below, and the linear layer with matrix P is 
shown in Fig.^ Furthermore, it can be proven that, for this linear layer, the 
matrix for linear expression is represented as the matrix ^P, which is the matrix 
P transposed using the concatenation rules. 
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5 Conclusion 

This paper studied a strategy for constructing mn-bit round functions with the 
2-round SPN structure that yield sufficiently small values of the maximum differ- 
ential and linear probabilities p, q. This strategy regards a linear transformation 
layer as an n x n matrix P over {0,1}. We described the relationship between 
the matrix representation and the actual construction of the linear transforma- 
tion layer. We proposed a search algorithm for constructing the optimal linear 
transformation layer by using the matrix representation. 

Furthermore, with this algorithm, we determined the optimal linear trans- 
formation layer yielding probabilities p< p^, q < in the case of n = 8. 
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Abstract. We introduce the notion of nonhomomorphicity as an al- 
ternative criterion that forecasts nonlinear characteristics of a Boolean 
function. Although both nonhomomorphicity and nonlinearity reflect a 
“difference” between a Boolean function and all the affine functions, they 
are measured from different perspectives. We are interested in nonhomo- 
morphicity due to several reasons that include (1) unlike other criteria, 
we have not only established tight lower and upper bounds on the non- 
homomorphicity of a function, but also precisely identified the mean of 
nonhomomorphicity over all the Boolean functions on the same vector 
space, (2) the nonhomomorphicity of a function can be estimated effi- 
ciently, and in fact, we demonstrate a fast statistical method that works 
both on large and small dimensional vector spaces. 

Key Words: Boolean Functions, Cryptography, Nonhomomorphicity, 
Nonlinear Characteristics. 



1 Motivation of this Research 

It is known that a function / on U„ is afhne if and only if / satisfies such property 
that for any even k with fc > 4, 

/(ui) © • • • © /(ufc) = 0 (1) 

whenever ui © • ■ • © Ufc = 0. 

In addition, it can be verified that / is afline if and only if there exists an 
even k with fc > 4 such that ^ holds whenever ui © ■ • ■ © Ufc = 0. Therefore we 
regard ^ characteristic that is useful in telling a non-afhne function from 

an afline one. 

Now consider a non-afRne function / on 14,. Let k be an even with fc > 4 and 
(ui, . . . , Uk) be a fc-tuples with ui © ■ • ■ © Ufc = 0. If 

f{ui) © • • • © f{uk) = 0 

then / satisfies the affine property at the particular vector (ui, . . . , Uk)- On the 
other hand, if 

/(Ul) © • • • © /(Ufc) = 1 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 2S0-^^| 1999. 
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then / behaves in a way that is against the affine property at (ui, . . . , Uk)- 

The above observations motivate us to define the number of fc-tuples of vec- 
tors in Vn, {ui , . . . ,Uk) with ui © • • • © Ufc =0 such that the affine property ^ is 
satisfied, as the homomorphicity of /, and furthermore, the number of fc-tuples 
of vectors in Vn, {ui , . . . , Uk) with Ui © ■ • ■ © Ufc = 0 such that the affine property 
^ is not satisfied, as the nonhomomorphicity of /. 

While nonhomomorphicity and nonlinearity are similar to each other in that 
they both refiect a “distance” between a Boolean function and all the affine func- 
tions, the former differentiates itself from the latter in the way the “distance” 
is measured. Nonhomomorphicity has several interesting properties suggesting 
that it can serve as a useful nonlinearity indicator: ( 1 ) unlike other criteria, 
we have not only established the tight lower and upper bounds on nonhomo- 
morphicity, but also precisely identified the mean of nonhomomorphicity over 
all the Boolean functions with the same size, (2) the nonhomomorphicity of a 
function can be estimated efficiently. In fact, we show a fast statistical method 
for estimating the nonhomomorphicity of a function. The computing time of 
the statistical method is not relevant to the dimension (number of variables) of 
the function. This guarantees that we can use a computer program to analyze 
Boolean functions of higher dimensions efficiently. 

2 Introduction to Boolean Functions 

Denote by Vn the vector space of n tuples of elements from GF{2). The truth 
table of a function / from Vn to GF{2) (or simply functions on Vn) is a 
( 0 , l)-sequence defined by (f(ao), /(ai), . . . , /(a 2 "-i)), and the sequence of / 
is a ( 1 , — l)-sequence defined by ((— (— . . . , (— where 
do = ( 0 , . . . , 0 , 0 ), di = ( 0 , . . . , 0 , 1 ), . . ., d 2 "-i-i = ( 1 ) • ■ • ) 1 ) !)• / is said to be 
balanced if its truth table contains an equal number of ones and zeros. 

Given two sequences a = (oi, • • • , am) and b = {b\, • • • , bm), their component- 
wise product is defined hy d*b= (ai5i, • • • , Ombm)- In particular, if m = 2 ” and 
a, b are the sequences of functions on Vn respectively, then d*b is the sequence 
of / © 5 . 

Let d = {ai, - ■ ■ ,am) and b = {bi, ■ ■ ■ ,bm) be two vectors (or sequences), 
the scalar product of d and b, denoted by (d,b), is defined as the sum of the 
component-wise multiplications. In particular, when d and b are from Vm, {d, b) = 
aibi © • ■ • © Qmbm, where the addition and multiplication are over GF{2), and 
when d and b are ( 1 , — l)-sequences, (a, b) = aibi, where the addition and 
multiplication are over the reals. 

A (1, — l)-matrix H of order m is called a Hadamard matrix ii HH* = mim, 
where Ft* is the transpose of H and Im is the identity matrix of order m. A 
Sylvester-Hadamard matrix of order 2”, denoted by Fin, is generated by the 
following recursive relation 



Hji—l Hn—l 
Hji—l Hn—l 



Ho = 1 , H, 



, n = 1, 2, . . .. 



(2) 
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Let 0 < i < 2" — 1, be the i row of Then ti is the sequence of a linear 
function ^pi{x) defined by the scalar product ipi{x) = (oi, i), where ai is the zth 
vector in Vn according to the ascending lexicographic order. (See for instance 
Lemma 2 of 0.) 

Definition 1. A function f on V„ is called an affine function if f{x) = c 0 
Oia;i0- • •0a„a;„ where and each aj and c are constant in GF(2). In particular, 
f is called a linear function if c = 0. 



Definition 2. The Hamming weight of a (0, l)-sequence ^ is the number of ones 
in the sequence. Given two functions f and g on Vn, the Hamming distance 
d{f,g) between them is defined as the Hamming weight of the truth table of 
f{x) 0 g{x), where x = (a;i, . . The nonlinearity of f, denoted by Nf, is 

the minimal Hamming distance between f and all the affine functions on Vn, 
z.e., Nf = miuj^i 2 ,..., 2 "+i <^(/) ‘Pi) where q)\, q) 2 , ■ ■ ■, <P 2 "+i the affine 

functions on Vn- 

It is known that the nonlinearity of a function f on Vn can be expressed as 

Nf = 2"-i - imax{|(C,£,)|,0 < z < 2" - 1} (3) 

where f is the sequence of / and Iq, . . i 2 "-i are the rows of Hn, namely, 
the sequences of the linear functions on Vn- (For a proof of Q see for instance 
Lemma 6 of J.) In addition, the maximum nonlinearity of a function is 2"“^ — 
2>-\ namely, Nf < 2""i - 

Given a function / on Vn, a (1, —1) matrix defined by M = ((— 
where ai, aj G Vn and 0 < z, j < 2" — 1, is called the (1, —1) incidence matrix, 
or simply, the matrix of /. The following is attributed to R. L. McFarland Q: 



M = 2-^Hn diag((^, io), if, {f, (4) 

where f be the sequence of function / on Vn, ii be the zth row of Hn, and 
diag(a, b, - ■ ■ ,c) denotes the diagonal matrix whose entries on the diagonal are 
a,b, . . .,c. 

A function / on Vn is called a bent function ^ if {f,£i)‘^ = 2" for every 
z = 0, 1, . . . , 2" — 1, where / is the sequence of / and it is a row in iL„. A bent 
function on Vn exists only when zz is a positive even number, and it achieves the 
highest possible nonlinearity 2"“^ — 22 "“^. 

3 Homomorphicity and Nonhomomorphicity 

The following lemma is important in this paper, as it explores a characteristic 
property of affine functions which will be useful in studying nonhomomorphicity. 

Lemma 1. Let f be a function on Vn- Then 
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(i) f is an affine function if and only if f satisfies such property that for any 
even k with fc > 4, f{ui) © • • • 0 f{uk) = 0 whenever © • • • 0 Ufc = 0, 

(ii) f is an affine function if and only if there exists an even k with fc > 4 such 
that f{ui) © • ■ • © f{uk) = 0 whenever ui © • ■ • © Ufc = 0. 



Proof. Let / be a function on Vn- We first prove Part (ii) of the lemma. 

Assume that / is affine. By using Definition^ it is easy to verify that for any 
even k with fc > 4, /(ui)©- • -®f{uk) = 0 whenever ui©- • -©Ufc = 0. Conversely, 
assume that there exists an even k with fc > 4 such that f{ui) © • • • © f{uk) = 0 
whenever ui © • • • © Ufc = 0. We now prove that / is affine. 

Let ui and U2 be any two vectors in Obviously, the k vectors ui, U2, 
ui®U 2 , 0, . . .,0 satisfy ui©U2©(ui©U2)©0©- • -©0 = 0. From the assumption, 

/(ui) © /(u2) © /(ui © U 2 ) © /(O) © • • • © f(0) = 0 (5) 

Consider two cases: /(O) = 0 and /(O) = 1. 

Case 1: /(O) = 0. In this case f(ca) = cf{a) holds for any vector a G Vn and 
any value c £ GF{2). Hence can be rewritten as 

/(Ul © U 2 ) = /(Ui) © /(U2) (6) 

where ui and U2 are arbitrary. 

Let 6 j denote the vector in Vn, whose the jth component is one and others 
are zero. For any fixed value Xj in GF{2), j = l,...,n, from Q, f{xi 6 i © 
• • • © XnCn) = f(xiei) © f(x2&2 © ’ ’ ’ © XnCn) Applying Q repeatedly, we have 
f{xiei © ■ • ■ © XnCn) = f{xiei) © f{x2e2) © ■ ■ ■ © fixnCn) Note that /(O) = 0 
implies f{ca) = cf{a) where c is any value in GF{2) and a is any vector in Vn- 
Hence 



f{xiei © • ■ • © Xnen) = Xif{ei) © ■ • ■ © a:„/(e„) (7) 

From the definition of 6j, XiCi © • • • © a;„e„ = {xi,. . a;„). On the other hand, 
if we write f{ej) = aj, j = 1 , . . . ,n then Q can be rewritten as f{xi, . . . , Xn) = 
aiXi © • ■ • © UnXn This proves that / is linear. 

Case 2: /(O) = 1. Set g{x) = 1 © f{x). Then g is linear. By using the result 
in Case 1, g{xi , . . . , Xn) = biXi © ■ • ■ © bnXn where each bj € GF{2). Hence 
f{xi , . . . , Xn) = 1 © biXi © ■ • ■ © bnXn This proves that / is affine. 

We now prove Part (i) of the lemma. Assume that / is affine. From Definition 
B it is easy to check that for any even k with fc > 4, f{ui) © • • • © f{uk) — 0 
whenever ui © • • • © Ufc = 0. Conversely, assume / satisfies such property that 
for any even fc with fc > 4, f{ui) © • • • © f{uk) = 0 whenever ui © • • • © Ufc = 0. 
Then from Part (ii) of the lemma, / must be affine. □ 

From the characteristic property shown in Lemma H T a function / on Vn 
satisfies f{ui) © ■ • ■ © f{uk) = 0 for a large number of fc-tuples (ui, . . . , Uk) of 
vectors in Vn with ui © • • • © Ufc = 0, then the function behaves more like an 
affine function. This leads us to introduce a new nonlinearity criterion. 
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Notation 1. Let f he a function on Vn and k an even with 4 < fc < 2". For 
c G GF{2), denote by the collection of ordered k-tuples {u\, . . . ,Uk) of 

vectors in Vn with ui © • • • © Ufc = 0 satisfying f{ui) © • • • © /(ufc) = c where 
c G GF{2) is constant. 

Definition 3. Let f be a function on Vn and k an even with 4 < k < 2". For 
c G GF{2), we call h^^l = the fcth-order homomorphicity of f, and 

furthermore, h^f \ = the fcth-order nonhomomorphicity of f, where ffS 

denotes the number of elements in a set S. 

Note that there exist fc-tuples of vectors in Vn, {ui, . . Uk), satisfying 

Uj = 0. Hence an interesting fact on h^^l follows: 

Lemma 2. Let f be a function on Vn- Then h^p^ + 

We note that LemmaHcannot be extended to the case of odd fc. This explains 
why we have not defined homomorphicity or nonhomomorphicity for an odd 
order. 

4 Calculations of Nonhomomorphicity 

4.1 High Order Auto-Correlation 

Recall that the auto-correlation of a function is defined as follows: 

Definition 4. Let f be a function on Vn- For a vector a G Vn, denote by ^{a) 
the sequence of f{x © a). Thus ^(0) is the sequence of f itself and ^(0) * ^(a) 
is the sequence of f{x) © f{x © a). Let A{a) be the scalar product of f{0) and 
^(a). Namely 

A{a) = m,ac^)) 

A(a) is called the auto-correlation of / with a shift a. 

Obviously, A{a) = 0 if and only if f{x)®f{x®a) is balanced, i.e., / satisfies 
the propagation criterion with respect to a. On the other hand, if |Z\(a)| = 2", 
then f{x) © f{x © a) is a constant and hence a is a linear structure of /. 

Next we consider a generalization of the definition for auto-correlation. The 
generalization will turn out to be a useful tool in studying nonhomomorphic 
characteristics of functions. 

Definitions. Let f be a function on Vn and ^ = (oq, ai, . . . , a 2 "-i) be the 
sequence of f. For a vector a G Vn and an integer k = 2,3,..., the fcth-order 
auto-correlation of f with a shift a, denoted by A^^\a), is defined as 

2"-l 

A^^\a) = A{a), A^'^\a) = ^ © a)], fc = 3,4,... 

3=0 

where A[a) is the auto -correlation of f as defined in Definition^^ and aj is the 
vector corresponding to the integer j. 
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It is important to point out that nonhomomorphicity, high order auto-corre- 
lation and high order derivation introduced in | are three completely different 
concepts. Let / be a function on V„. In Q, the derivation of / at vector /3, 
denoted by A/sf{x), is defined as follows 

^13 fix) = fix) © fix © /3). 

and the kth-order derivation of / at vectors /3i , . . . , (3k, denoted by fix), 

is defined recursively as 

^f^l..,f3Jix) = A{Af-\_J{x)). 

We can see the fcth-order derivation of / at vectors (3i,...,(3k, A^^^ ^^f{x), 
is itself a function on V„. In contrast, both the fcth-order nonhomomorphicity 
and the fcth-order auto-correlation of / with a shift (3 are fixed integer values. 
To examine further how the three concepts differ, consider a bent function / 
of degree s. For fc even with k > s, the fcth-order derivation of / at vectors 
j3i,...,j3k, A^^^ ,0kfi^^’ obviously the zero function. In contrast, for the fcth- 

order auto-correlation of /, we have = 2“"' = 25"-^ (which 

follows from Corollary J and LemmaHto be introduced later on), and for the 
fcth-order nonhomomorphicity of /, we have which 

follows from Theoremjin Section^ 

To examine the properties of the fcth-order auto-correlation A^3^\a), we con- 
sider a matrix defined by iA^^\ui © ttj)) where i,j = 0, 1, . . . , 2” — 1. Note 
that the diagonal of the matrix {A^^\at © Uj)) is composed of 2” repetitions of 
By simple induction on fc, we have the following result: 

Theorem 1. Let f be a function on Vn, M be the matrix of f and f be the 
sequence of f. Then 

iA^^\a, © a,)) = = 2-"iL„ diag{{f, io)\ (C, ^i)", • ■ • , (?, 

where io, £i, . . £ 2 '*-i are the rows of Hn- 

This result shows that the two matrices, {A^^\ui © aj)) and 
diag((^,4)",(C,^i)", ^2»-i)") 

are similar in the sense that from the former one can easily find out the latter 
through the use of Hn, and vice versa. Furthermore, it is not hard to see that 
the sum of the entries on the diagonal of (B aj)) is identical to that of 

diag((C,4)'', ith)'", ■ . ., (C,^ 2 "-i)'')- In other words, 

2^-1 2^-1 

Y, ® a.) = TA^'^\q) = Y 
2=0 2=0 



Hence we have proved 
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Corollary 1. Let f be a function on Vn, M be the matrix of f and ^ be the 
sequence of f. Then A(^^(0) = 2“” 

For k = 2, we have = 2". This indicates that Corollary ^embodies 

Parseval’s equation (Page 416 of Q) Xli=o = 2^" as a special case in 

which k = 2. 



4.2 Expression of Nonhomomorphicity by Other Indicators 

Recall B, the nonlinearity of a function / on 14, is related to the maximum 
I (^,4) I, where ^ is the sequence of / and 4 is the ith row of We give a 
precise expression of nonhomomorphicity by using the same indicator. 

Theorem 2. For a function f on Vn and k an even with 4 < fc < 2". and 
h^Pi can be expressed as follows: 

(i) hfl = + + 

(ii) hfl = 2(fc-i)"-i - iz\('=)(0) = 

where ^ is the sequence of f and 4 denotes the ith row of Hn- 



Proof. We need only to prove that h^^l = 2^^ ^ — iz\^^i(0), as the rest part 

of the theorem follows from CorollaryHand the fact that h^^^ + h^^l = 

Write f = (oo, oi, . . . , a 2 "-i) where each Oj = ±1. Consider Uj G Vn, j = 
1, . . . , fc, and Uj = 0. Clearly, f{uj) = 1 if and only if = -1 

where the subscript Uj in is viewed as the integer representation of vector 
Uj. It is easy to verify 



-(l-77^ia„J = 



1 if 0j=i f{uj) = 1 

Oif0,ti f{uj)=0 



Hence 
h^^^ - - 

^f.i - 2 



E (1- ^Uj 0>U2 * * ’ ^Uk ) 

0j=i “j=o 



E a 



■ 0,ui O^U2 ' ■ ■ ^Uk-1 ^iii0n20-"0itfc 



-J 



Ui,...,Uk-l£Vn 



2^{k—l)n—l _ ^ ^ 



(^u\(^U2 ■ ’ ■ <^'Ui0U20-"0nfc_i 



ni,...,nfc_iGVn 



2^{k — l)n—l 

- 1 . E 



^Ui^U 2 ' ’ ' ^Uk —2 



^ ' ®Ufc_iflui©M 2 ©---©Uu^_ 2 ©Mfc-l 



Ui,...,Uk-2eVn 



Uk-lGVn 
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1)" i_i ^ © U2 © ■ ■ ■ © Ufc- 2 ) 

Ui,...,Uk- 2 ^Vn 

2^{k—l)n—l 

2 ^ ^ ’ ’ ’ ^Uk —3 ^ ^ ^(’^1 © ’^2 0 ■ * ■ 0 ‘Uk — 2) 

Ui,...,Uk- 3 ^Vn Uk- 2 ^Vn 

= 2(''-i)”-i _ i ^ • • •a„;,_ 3 Z\(^)(ui © U2 © ■ ■ ■ © Ufc-3) 

Ui,...,Uk- 3 &Vn 

^2(fc-i)n-i_l ^ a„,a„3Z\('=-")(ui©W2) 

Ul,U 2 eVn 

Ul^Vn U 2 GVn 

MlSTn 

This completes the proof. □ 



5 Bounds on Nonhomomorphicity 

First we introduce Holder’s Inequality ^ that will be used in our discussions 
on lower and upper bounds. It states that for real numbers Cj > 0, dj > 0, 
j = 1, . . . ,k, p and q with p > 1 and i + | = 1, the following is true: 

t=i i=i i=i 

where the quality holds if and only if there exists a constant 1 / > 0 such that 
Cj = vdj for each j = 1, . . . , fc. 

By using Holder’s Inequality, we can prove 

Lemma 3. Let f be a function on V„ ond k an even integer with fc > 4. Then 

2"-l 

i=0 

where the equality holds if and only if n is even and f is bent. 

Armed with the above result, next we show a bound on nonhomomorphicity. 

Theorem 3. Let f be a function on Vn and k an even integer with fc > 4. Then 
the following statements hold: 
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(i) satisfies 

bent. In other words, f is bent if 

r> infc— 1 

'‘/.I - ^ 

Recall that the nonlinearity of a function reaches the minimum nonlinearity 
if and only if the function is affine while the nonlinearity of a function reaches 
the maximum nonlinearity if and only if the function is bent. The above theorem 
shows there exists a consistent relationship between nonlinearity and nonhomo- 
morphicity, especially when the order of nonhomomorphicity is large. Thus, if 
is large, we expect that / is closer to a bent function than to an affine one, 

~(k) 

and conversely if hj, { is small, then the function is closer to affine than to bent. 
As we have the following complementary result: 

Corollary 2. Let f be a function on Vn and k an even integer with fc > 4. Then 
the following statements hold: 

(i) h^pQ satisfies 

2(fc-i)n-i < 2(fc-i)"-i + i(2" - (10) 



2(fc-i)„-i _ 1^2^* _ 2^^)fc < ^ik) 

where Nf denotes the nonlinearity of f, 
(ii) An equality in Q holds if and only if f is 
and only if 

l(fc) r)(k—l)n—l 



(ii) 



where Nf deno tes t he nonlinearity of f, 

An equality in ^3 holds if and only if f is bent. In other words, f is bent 
if and only if 






A consequence of Theoremfland Corollary3is 



Corollary 3. Let f be a function on Vn and k an even integer with fc > 4. Then 
~ and the equality holds if and only if f is bent. 

An implication of the above corollary is that there exists no function on Vn 
such that h^^l = h^^l. 



6 Comparing Nonhomomorphicity and Nonlinearity 

A natural question on nonhomomorphicity is how it is related to other nonlin- 
ear characteristics, such as nonlinearity which indicates the minimum distance 
between a particular function and all the affine functions. It turns out that 
nonhomomorphicity and nonlinearity are two indicators that are not directly 
comparable. We demonstrate this by inspecting three specific functions /, g and 
h on Vzs with s > 5. 

Recall that the rows in Hg, the Sylvester-Hadamard matrix of order 2®, are 
denoted by 4, t = 0, 1, . . . , 2® — 1. The three functions are defined as follows: 
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1. / — the sequence of / is the concatenation of ^2, • • •, ^2«-i with l\ being 
repeated twice, i.e., ^1, ^1, ^2, ■ ■ ■ j 

2. g — the sequence of g is composed of four repetitions of a bent sequence r\ 

of length i.e., 77, g, 77, g. 

3. h — the sequence of / is the concatenation of ii, £4, . . £^‘-1 with £\ being 
repeated four times, i.e., ^1, ^1, ^1, ^1, ^4, . . . , ^2*-i- 

By using Q, we know that the nonlinearities of the three functions are 
Nf = Ng = - 2", and Nh = - 2"+k 

Consider k even with fc > 4. By Theorem^ we have the following nonhomo- 
morphic characteristics for the three functions: 



sk-\-2s 

2^2(k—l)s — l 2 ~ 2 s — 1 ^ 2^sk~\-k~\-2s — 2 

2^2{k—l)s—l 2 ~ 2 s — 1 ^ 2 ®^+ 2 s 2^sk-\-s-\-2 



hfl = 22(fc-l)s-l_ 2 



-2s-l 



(2 



)Sfc + fc + S — 1\ 



+ 2 



sk-\-2k-\-s — 2 



) 



Thus for these three functions /, g and h, their nonlinearities and nonhomo- 
morphic characteristics are related as follows: 

(i) / v.s. g: Nf = Ng, but hfl > hf}. 

(ii) / v.s. h\ Nf > Nh, and hfl > hf\. 

(iii) g v.s. h: Ng > Nh, but hfj < hfl if fc < s — 1, and hfj > hfl if fc > s. 

Properties of these three functions clearly show that nonlinearity and non- 
homomorphicity are not comparable indicators. They, however, can be used to 
complement each other in studying cryptographic properties of functions. 

The two functions g and h are of particular interest: while hg [ < hh [ for 
fc < s — 1, their positions are reversed for k > s. This motivates us to examine 
the behavior of nonhomomorphicity as fc becomes large. 

Theorem 4. Let f and g he two functions on Vn- If h^f i then there 

is an even positive ko, such that fcy 1 < hgi for every even fc with fc > fco, or 
h’f I > hg i for every even fc with fc > fco. 

Assume that Nf > Ng. Then from fl, we have 

max{|(^,£i)|,0 < z < 2" — 1} < max{| (77, £i)|, 0 < z < 2"' — 1}. 

Using a similar proof to that for the above theorem, we can show 

Theorem 5. Let f and g be two functions on Vn- If Nf > Ng, then there is an 
even positive fco, such that h'ji > hfi for every even fc with fc > fco. 
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While Theorem 5 shows that nonhomomorphicity and nonlinearity are con- 
sistent when the dimension k is large, the three example functions /, g and 
h, together with Theorems J and H do indicate that nonhomomorphic charac- 
teristics of a function cannot be fully predicted by other cryptographic criteria, 
such as nonlinearity. Therefore, nonhomomorphicity can serve as another impor- 
tant indicator that forecasts certain cryptographically useful properties of the 
function. 

Comparing (ii) of Theorem^and Q, we find that although both nonlinearity 
and nonhomomorphicity reflect non-afhne characteristics, the former focuses on 
the maximum \ {^,£i)\ while the latter is more concerned over all |(^,^)|. 



7 The Mean of Homomorphicity and Nonhomomorphicity 

Let / be a function on Vn, X denote an indicator (a criterion or a value), and Xf 
denote the indicator of /. Note that there precisely 2^ functions on Vn- We are 
concerned with the mean of the indicator x over all the functions on Vn, denoted 

by X, i-e. X = 2"2"E/X/- 

The upper and lower bounds on x/ cannot provide sufficient information on 
the distribution of x of a majority of functions. For this reason, we argue that 
the mean of the indicator x over all the functions onW,, i.e. x = 2“^ S/X/i 
should also be investigated. Note that there exist precisely 2^ functions with n 
variables. 

Notation 2. Let Ok (k is even) denote the eollection of k -tuples {u\, . . . ,Uk) 
of vectors in Vn satisfying Uj-^ = Uj ^, . . . , = Uj^, where {ji, J2, ■ • ■ , Jfc} = 

{l,2,...,fc}. Write Ok = ff Ok. 

It is easy to verify 

Lemma 4. Let n and k be positive integers and ui 0 ■ • ■ © Ufc = 0, where each 
Uj is a fixed vector in Vn ■ Then 

f{ui) © • ■ • 0 f{uk) = 0 

holds for every function f on Vn if and only if k is even and {u \, . . . , Uk) € Ok- 



Lemma 5. In Notation^^ let k be an even with 2 < fc < 2". Then 
kj1 



ok = J2 






t 



E 



(fc)! 



PiH hpt=fc/2. Pj >0 



(2pi)!---(2pt)! 



Proof. Let (ui, . . . , Uk) G Ok. Then the multiple set {ui , . . . , Uk} can be divided 
into t disjoint subsets LIi, . . . ,LIt where (1) 1 < t < fc, (2) each Llj is a 2pj 
{pj > 0) copy of a vector fij i.e. II j = {/3j , . . and \IIj\ = 2pj, (3) Pj Pi, 
if j i, (4) {ui , . . . , Ufc} = Til U • • • U il*. 
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Note that there exist 



t 



different choices of t distinguished vectors /3i , 



I3t from Vn- Arranging each multiple set {ui, . . -,Uk}, we obtain precisely (fc)!/ 
(2pi)! • • • (2pt)l distinguished ordered sets. Note that 2pi + • • • + 2pt = k and 
1 < t < k/2. The proof is completed. □ 

From LemmaH if (ui, . . . , Uk) G Ok then /(ui) © ■ ■ ■ 0 /(ufc) = 0 holds for 
every function / on Vn- Therefore, in this case /(ui) © ■ ■ ■ © /(ufc) = 0 with 
ui © ■ • ■ © Ufc = 0 does not really reffect an affine property. Hence we focus on 
Tfg - Ok and nfl 

Theorem 6. Let k be an even with 2 < fc < 2". Then 

(i) the mean of h^^^ over all the functions on Vn i-e. 2“^ Sf satisfies 






/ 



where Ok is given in Lemma^^ 

(ii) the mean of h^^l over all the functions on Vn i-e. 2“^ satisfies 



on 






Proof. To prove Part (i), we consider two cases for (ui, . . . , Uk) G 

Case 1: (ui, . . . , Uk) G Ok- From LemmaH /(ui) © ■ • ■ © /(ufc) = 0 holds for 
every function / on Vn- 

Case 2: (ui, . . . , Uk) G — Ok- Note that /(ui) © • ■ • © f{uk) takes the 
value zero and the value one with an equal probability of a half for a random 
function / on P„. Therefore 

2-2" ^ fcg = 2-2" ^ 40k + 2-2"^ #(Hg(0) - Ok) = Ok + - Ok] 

Iff 

This proves (i) of the theorem. 

Part (ii) can be proven in a similar way, once again by noting that /(ui) © 

• • • © f{uk) takes the value zero and the value one with an equal probability of 
a half, for a a random function / on 14,. □ 

A function whose nonhomomorphicity is larger than the mean, namely hj [> 

XI/ ^/^i> indicates that the function is more nonlinear. The converse also 
holds. 
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8 Relative Nonhomomorphicity 



The concept of relative nonhomomorphicity introduced in this section is useful 
for a statistical tool to be introduced later. 



Notation 3. Let k be an even with fc > 4 and Rk denote the collection of ordered 
k-tuples {ui, . . Uk) of vectors in Vn satisfying ui © • • • 0 Ufc = 0. 

We have noticed 

#i?fc = and #(i?fc - Ok) = (11) 

From the proof of Theorem^ ii {m, . . . ,Uk) G Rg — Ok then /(ui)®- ■ -©/(ufc) 
takes the value zero and the value one with equal probability. 



Definition 6. Let f be a function on V„ <md k be an even with fc > 4. Define the 






fcth-order relative nonhomomorphicity off, denoted by p^^l, as 

l.e. Pj -^ — 2(fc-l)n_o^ • 



From Theorem n we obtain 

Corollary 4. Let k be an even with 2 < fc < 2". Then the mean of pj { over all 
the functions on Vn i-e. 2“^" satisfies 2“^" P^fi = 5 - 

From Corollary^ 



(fc) 

Pf.i 




then the nonhomomorphicity of / is not smaller than the mea 
then the nonhomomorphicity of / is smaller than the mean 



^ 2 ) 



(k) 1 

In practice, if p), { is much smaller than then / should be considered crypto- 
graphically weak. 



9 Estimating Nonhomomorphicity 



As shown in Theorem^ the nonhomomorphicity of a function can be determined 
precisely. In this section, however, we introduce a statistical method to estimate 
nonhomomorphicity. Such a method is useful in fast analysis of functions. 
Denote a real-valued (0, 1) function on Rk — Ok, t{ui , . . . , Uk), as follows 



t{ui, .. .,Uk) 



1, if f{ui) © ■ • ■ © f{uk) = 1 
0, otherwise 



Hence from the definition of nonhomomorphicity we have 

= X] t{ui,...,Uk) 

{ui,...,Uk)GRk-Ok 
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Let 17 be a random subset of Rk — Ok- Write a; = #17 and 

t=— t{ui,...,Uk) (13) 

H (k) — 

Note that this is the “sample mean” In particular, i? = Rn ^ — O^, t is 

(k) 

identified with the “true mean” or “population mean” namely, pj 
Now consider , Ufc) - tf- We have 

-2t- y2 t{ui, . . . ,Uk) + 

{ui,. ,.,Uk)&0 

Note that t^{ui, . . . , Uk) = t{ui, . . . , Uk)- From 

E — 9 _2 _2 _2 _2 

{t{ui , . . . , Uk) —t) = uit — 2ujt -j- uit = uit — 2ujt + Lot 

(ui,...,'Ufc)Gt2 

= uji{l — t) (14) 

Hence the quantity of Epii „^)gr 2 (f(wi, • ■ • , Uk) — t)^, which is called 

the “sample standard deviation” E and is usually denoted by p, can be expressed 
as 



M = 



\ 



iU — 1 



yy (t(ui, . . .,ufc) - 1)2 = 



(ui,...,uk)eO 



I LUt{l — t) 

UJ — 1 



(15) 



By using (4.4) in Section 4.B of Hi “true mean” or “population mean”, p'^l, 
can be bounded by 



t- Z, 



M . (fc) ^ J , ry r' 



(16) 



where Z ^/2 denotes the value Z of a “standardized normal distribution” which to 
its right a fraction e/2 of the data, 1^3 holds with a probability of (1— e)100% Q/l- 
For example. 



when e = 0.2, Zf ./2 = 1.28, and ^3 holds with a probability of 80%, 
when e = 0.1, Zf ./2 = 1.64, and ^3 holds with a probability of 90%, 
when e = 0.05, Zf ./2 = 1.96, and ^3 holds with a probability of 95%, 

when e = 0.02, Zf ./2 = 2.33, and ^3 holds with a probability of 98%, 

when e = 0.01, Zf .^2 = 2.57, and ^3 holds with a probability of 99%, 

when e = 0.001, Ze /2 = 3.3, and 1^3 holds with a probability of 99.9% 
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1 

2 



From 0 < i < 1 and it is easy to verify that /i in J 
This implies that can be simply replaced by 



satisfies 0 < p < 



^e/2 



< <t + 



^e/2 

2vS^’ 



(17) 



where ^3 holds with (1 — e)100% probability. Hence if cu i.e. #17 is large, 
then the lower bound and the upper bound on in ^3 are closer to each 
other. On the other hand, if we choose uj = large enough then Ze/ 2 :^ is 
sufficiently small, and hence ^3 and will provide us with useful information. 

For instance, viewing Corollary H^'iid ^3, we can choose u = #17 such that 
< 10“P. Hence u > Zf,ii ■ 10^*’ is large enough. In this case ^3 is 
specialized as 



t - IQ-P < p^fl < t + IQ-P (18) 

where 1^3 holds with (1 — e)100% probability. 

In summary , we can analyze the nonhomomorphic characteristics of a func- 
tion on Vn in the following steps: 



1. we randomly fix even k with fc > 4, for example, fc = 4, 6 or 8, and randomly 
fix a large integer oj, for example, uj > Ze /2 • 10^^, and randomly choose a 
subset of Rk — Ok, say 17, with #17 = w, 

2. by using ^3> determine t, i.e. “the sample mean”, 

3. by using ^3> determine the range of p^^l with a high reliability, 

4. viewing p^^l in ^3> from Corollary 3 



(k) ( > h then / is not less nonhomomorphic than the mean 
4 I > I then F is less nonhomomorphic than the mean 



(19) 



where |3 holds with (1 — e)% probability, 

5. if p^^l is much smaller than ^ then / should be considered as cryptographi- 
cally weak. 



We have noticed that the statistical analysis has following advantages: 

(1) the relative nonhomomorphicity, p^^l can be precisely identified by the use 
of “population mean” or “true mean” , 

(2) by using this method we do not need to search through the entire Vn, 

(3) the method is highly reliable. 



10 Extensions to S-boxes 

Obviously, the concept of nonhomomorphicity of a Boolean function can be ex- 
tended to that of an S-box in a straightforward way. Analysis of the general 
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case of an S-box, however, has turned out to be far more complex. Neverthe- 
less, we have obtained a number of interesting results on S-boxes, some of which 
encompass results presented in this paper. We will report the new results in a 
forthcoming paper. In the same paper we will also discuss how to utilize nonho- 
momorphic characteristics of an S-box employed by a block cipher in analyzing 
cryptographic weaknesses of the cipher. 

11 Conclusions 

Nonhomomorphicity is a new indicator for nonlinear characteristics of a function. 
It can complement the more widely used indicator of nonlinearity. Two useful 
properties of nonhomomorphicity are: (1) the mean of nonhomomorphicity over 
all the Boolean functions over the same vector space can be precisely identified, 
(2) the nonhomomorphicity of a function can be estimated efficiently, regardless 
of the dimension of the vector space. 
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Abstract. We present an attack on the ORYX stream cipher that re- 
quires only 25-27 bytes of known plaintext and has time complexity of 
2^®. This attack directly recovers the full 96 bit internal state of ORYX, 
regardless of the key schedule. We also extend these techniques to show 
how to break ORYX even under a ciphertext-only model. As the ORYX 
cipher is used to encrypt the data transmissions in the North American 
Cellular system, these results are further evidence that many of the en- 
cryption algorithms used in second generation mobile communications 
offer a low level of security. 



1 Introduction 

The demand for mobile communications systems has increased dramatically in 
the last few years. Since cellular communications are sent over a radio link, it 
is easy to eavesdrop on such systems without detection. To protect privacy and 
prevent fraud, cryptographic algorithms have been employed to provide a more 
secure mobile communications environment. First generation mobile communi- 
cations devices were analog. Analog cellphones rarely use encryption, and in 
any case analog encryption devices offered a very low level of security |. Over 
the last five years digital mobile communications systems have emerged, such 
as the Global Systems Mobile (GSM) standard developed in Europe and several 
Telecommunications Industry Association (TIA) standards developed in North 
America Q. For these digital systems, a much higher level of security, using 
modern encryption algorithms, is possible. Unfortunately, algorithms which of- 
fer a high level of security have not been used in mobile telecommunications to 
date. 

In the case of GSM telephony, it is shown in ^ that it may be possible to 
conduct a known plaintext attack against the voice privacy algorithm used in 
GSM telephones, the A5 cipher. More recently it was shown in Q that it is 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 296-^^| 1999. 

(c) Springer-Verlag Berlin Heidelberg 1999 
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possible to clone GSM telephones by conducting a chosen-challenge attack on 
the COMP128 authentication algorithm. 

The North American digital cellular standards designed by the TIA, includ- 
ing time division multiple access (TDMA) and code division multiple access 
(CDMA) both use roughly the same security architecture. The four crypto- 
graphic primitives used in these systems and described in the TIA standard | | 
are: 



— CAVE, for challenge-response authentication protocols and key generation. 

— ORYX, a LFSR-based stream cipher for wireless data services. 

— CMEA, a simple block cipher used to encrypt message data on the traffic 
channel. 

— For voice privacy, TDMA systems use an XOR mask, or CDMA systems use 
keyed spread spectrum techniques combined with an LFSR mask. 

The voice privacy algorithm in TDMA systems is especially weak since it is based 
on a repeated XOR mask. Such a system can be easily attacked using ciphertext 
alone [[]. The CMEA algorithm is susceptible to a known plaintext attack [[]. 
In this paper the security of the ORYX algorithm is examined. 

ORYX is a simple stream cipher based on binary linear feedback shift regis- 
ters (LFSRs) that has been proposed for use in North American digital cellular 
systems to protect cellular data transmissions |3]- The cipher ORYX is used 
as a keystream generator. The output of the generator is a random-looking 
sequence of bytes. Encryption is performed by XORing the keystream bytes 
with the data bytes to form ciphertext. Decryption is performed by XORing 
the keystream bytes with the ciphertext to recover the plaintext. Hence known 
plaintext-ciphertext pairs can be used to recover segments of the keystream. In 
this paper, the security of ORYX is examined with respect to a known plaintext 
attack conducted under the assumption that the cryptanalyst knows the com- 
plete structure of the cipher and the secret key is only the initial states of the 
component LFSRs. 

For this attack, we assume that the complete structure of the cipher, includ- 
ing the LFSR feedback functions, is known to the cryptanalyst. The key is only 
the initial states of the three 32 bit LFSRs: a total keysize of 96 bits. There 
is a complicated key schedule which decreases the total keyspace to something 
easily searchable using brute-force techniques; this reduces the key size to 32 
bits for export. However, ORYX is apparently intended to be a strong algorithm 
when used with a better key schedule that provides a full 96 bits of entropy. The 
attack proposed in this paper makes no use of the key schedule and is applicable 
to ORYX whichever key schedule is used. 

2 The ORYX Cipher 

The cipher ORYX has four components: three 32-bit LFSRs which we denote 
LFSR^, LFSRs and LFSRs-, and an S-box containing a known permutation L 
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of the integer values 0 to 255, inclusive. The feedback function for LFSR/y is 

+ x^'^ + x^^ + x^^ + + a;® + x® + a: + 1. 

The feedback functions for LFSRyi are 

a.32 ^26 ^23 ^22 ^16 ^12 + a;^® + a;® + a;’' + a;® + a;^‘ + a;^ + a; + 1 

and 

x®^+x^^+x^®+a;^®+a;^'^+x^®+a;^^+a;®^+a;^®+a;^®+a;^°+a;®+a;®+a;^+a;^+a;+l. 

The feedback function for LFSRs is 

a;®^ + a;®® + a;^® + a;^° + a;®® + a;®® + a;® + a;® + a; + 1. 

The permutation L is fixed for the duration of a call, and is formed from 
a known algorithm, initialized with a value which is transmitted in the clear 
during call setup. Each keystream byte is generated as follows: 

1. LFSRif is stepped once. 

2. LFSR^ is stepped once, with one of two different feedback polynomials de- 
pending on the content of a stage of LFSRjy. 

3. LFSRb is stepped either once or twice, depending on the content of another 
stage in LFSRx. 

4. The high bytes of the current states of LFSR_r-, LFSR^, and LFSRb are 
combined to form a keystream byte using the combining function: 

Keystream = {HighSK + L[High8A] + L[High8B]} mod 256 

3 Attack Procedure 

Since ORYX has a 96-bit keyspace, it is not feasible to simply guess the whole 
generator initial state and check if the guess is correct. However, if the generator 
initial state can be divided into smaller parts, and it is possible to guess one small 
part of the generator initial state, and incrementally check whether that guess is 
correct, the generator can be attacked. The attack presented in this paper uses 
this divide and conquer approach, and is a refinement of a method originally 
proposed in fl. A feature of ORYX which contributes to the efficiency of the 
attack outlined in this paper is that the two stages of LFSRx whose contents 
control the selection of the feedback polynomial for LFSRyi and the number of 
times LFSRb is stepped are both within the high eight stages of LFSRiy. Since 
the keystream bytes are formed from the contents of the high eight stages of 
each of the three LFSR states, we divide the keyspace and focus our attack on 
these 24 bits. 
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3.1 Attack Algorithm 

Denote the high eight bits of the three LFSRs at the time the byte of 
keystream is produced by High%A{i), High8B{i) and High8K{i)- The initial con- 
tents are High8A{0), High8B{0) and High8K{0), and all registers are stepped 
before the first byte of keystream, denoted Z{1), is produced. To produce a 
keystream byte Z{i + 1) at time instant i -I- 1, LFSR/y is stepped once, then 
LFSRyi is stepped once, then LFSRb is stepped either once or twice. The con- 
tents of High8A{i + 1), High8B{i + 1) and High8K{i + 1) are then combined 
to form the keystream byte Z{i + 1). Therefore, there is no need to guess all 
24 bits: if we guess the contents of High8A{^) and High8B{^) we can use the 
first byte of the known keystream Z{1) and the combining function to calculate 
the corresponding contents of High8K{^)- Thus the attack requires exhaustive 
search of only a 16 bit subkey: the contents of High8A{^) and High8B{^)- 

For a particular 16-bit guess of High8A{^) and High8B{^), we use Z(l) 
and calculate the corresponding contents of High8K{^)- After this calculation, 
the attack proceeds iteratively as we construct a path of guesses of High8A{i), 
High8B{i) and High8K{i) which are consistent with the known keystream. In 
each iteration a set of predictions for the next keystream byte is formed, and 
the guess evaluated by comparing the known keystream byte with the predicted 
values. 

In the ith iteration, we exploit the fact that after stepping the three LFSRs 
to produce the next output byte, High8K{i + 1) and High8A{i + 1) effectively 
have one unknown bit shifting into them and, depending on High8K(i + 1)> 
High8B{i + 1) has either one or two unknown bits, for each byte of output. We 
try all possible combinations of these new input bits, a total of 12 combinations, 
and compute the output byte for each case. At most, there will be 12 distinct 
output bytes which are consistent with the guess of High8A{i), High8B{i) and 
High8K{i)- We compare the known keystream byte Z{i + 1) with the predicted 
output bytes. 

If Z{i + 1) is the same as one of the predicted output bytes, for the case 
where there are 12 distinct outputs, then a single possible set of values exists for 
High8K{i + 1), High8A{i + 1) and High8B{i + 1). We use these values in the 
next iteration of the attack. 

Occasionally, where there are less than 12 distinct outputs, and the keystream 
byte is the same as the predicted output byte for more than one combination 
of new input bits, we must consider more than one possible set of values for 
High8K{i+^), High8A{i+'^), and High8B{i+'^)- That is, the path of consistent 
guesses we are following may branch. In this situation we conduct a depth- first 
search. 

If the keystream byte is not the same as any of the predicted output bytes, 
then the guessed contents of High8A{i) and High8B{i) were obviously incorrect. 
We go back along the path to the last branching point and start to trace out 
another path. If we search each possible path without finding a path of consistent 
guesses of length equal to the number of bytes of known keystream, then the 
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guessed contents of High%A{^) and HigbSsi^) ^s^e obviously incorrect, and we 
make a new 16-bit guess and repeat the procedure. 

When we find a sufficiently long path of consistent guesses, we assume that 
the values for High8A{^), HighSui^) and High8K{^) were correct. This pro- 
vides knowledge of the contents of the high eight stages of each of the three 
LFSRs at the time that the first byte of keystream was produced. For the 24 
consecutive guesses High8A{i), High8B{'i) and High8K{i) for 2 < i < 25, each 
set of values: High8K{i) and High8A{i) gives another bit in the state of LFSRif 
and LFSR^, respectively, and High8B{i) gives either one or two bits in the state 
of LFSRb . Once we reconstruct the 32-bit state of each LFSR, at the time the 
first keystream byte was produced the LFSR states can then be stepped back to 
recover the initial states of the three LFSRs: the secret key of the ORYX gener- 
ator. Thus we recover the entire key using a minimum of 25 bytes of keystream, 
and at most 2^® guesses. We use the recovered initial states to produce a candi- 
date keystream and compare to the known keystream. If the candidate keystream 
is the same as the known keystream, the attack ends, otherwise we make a new 
16 bit guess and repeat the procedure. 

In practice, we may occasionally need a few more than 25 keystream bytes 
to resolve ambiguities in the final few bits of the LFSR states. That is, we may 
need to chase down a few more false trails to convince ourselves we got the last 
few bits right. 



3.2 Testing Procedure 

The performance of the attack was experimentally analyzed to find the pro- 
portion of performed attacks for which the initial states can be successfully 
recovered, for various keystream lengths. The experiments use the following pro- 
cedure: Nonzero initial states are generated for LFSRyi, LFSRs and LFSRx- A 
keystream segment of length iV, {Z(z)}Y ^ is produced using the ORYX cipher 
as outlined in Section ^ The attack, as described in Section is launched 
on the produced segment of the keystream. An outline of the testing procedure 
follows: 

— Input: The length of the observed keystream sequence, N. 

— Initialization: z = 1, where z is the current attack index. Also define Zmaxjtlie 
maximum number of attack trials to be conducted. 

LFSR^ initial state seed index, j is the current LFSRs initial state seed 
index and k is the current LFSRx initial state seed index. 

— Stopping Criterion: The testing procedure stops when the number of attacks 
conducted reaches imax- 

— Step 1: Generate pseudorandom initial state seeds ASEED^, BSEED^ and 
KSEEDi for LFSR^, LFSRs and LFSRx, respectively, (pseudorandom num- 
ber routine drand48, see Q is used). 

— Step 2: Generate pseudorandom LFSR initial states using ASEED^, BSEED^ 
and KSEEDi for LFSR^, LFSRs and LFSRs-, respectively. 

— Step 3: Generate the keystream sequence of bytes {Z{i)}f^T^. 
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— Step 4: Apply the attack to to obtain the reconstructions of the 

initial states of the three LFSRs. 

— Step 5: If i < imax, increment i and go to Step 1. 

— Step 6: Stop the procedure. 

— Output: Reconstructed initial states of LFSR^, LFSRs and LFSRx- 

4 Implementation Issues for the Attack 

The attack procedure described in section involves assuming that a partic- 
ular guess of High%A{i), High8B{i) and High8K{i) is correct, using this guess 
to form predictions for the next keystream byte, and then comparing a known 
keystream byte with the predictions: if the keystream byte contradicts all pre- 
dictions, we conclude that the guess was wrong. However, it is possible that the 
keystream byte Z{i + 1) will be the same as one of the predicted output bytes, 
although the values for High8^(«) and High8s(2) are incorrect. We refer to such 
a situation as a false alarm. 



4.1 The Probability of a False Alarm 

For the attack to be effective, the probability of a false alarm occurring must be 
small. Therefore, we require a high probability that an incorrect guess will be 
detected (through comparison of the predicted outputs with the corresponding 
keystream byte). That is, we require the probability that no predicted output 
byte matches the actual keystream byte to be significantly greater than one 
half, given that the guessed contents of High8A{i), High8B{'i) and High8K{i) 
are incorrect. 

Consider the formation of the predicted output values. Each predicted out- 
put is formed from an 8-bit possible value for High8A{i + 1), an 8-bit possible 
value for High8B{i + 1) and an 8-bit possible value for High8K{i + !)• So there 
are a total of 2^"^ different input combinations. The predicted output has 8 bits. 
Therefore, given a particular output value, there exist multiple input combina- 
tions which result in this output. As the inputs are all non-negative integers 
less than 256 and the combining function is the modulo 256 sum of the three 
inputs, all output values are equally likely if the input combinations are equally 
likely. Thus each output value is produced by 2^® different input combinations. 
If one of these input combinations is the correct combination, then there are 
2^® — 1 other combinations which, although incorrect, produce the same value. 
The probability that a single incorrect input combination produces the same 
output as the correct combination is ~ 0.0039. The probability that a sin- 
gle incorrect input combination produces a value different to the output of the 
correct combination is the complement of this; approximately 0.9961. We select 
a set of twelve input combinations. The probability that an incorrect guess will 
be detected (through comparison of the predicted outputs with the correspond- 
ing keystream byte) is the probability that none of the predicted outputs is the 
same as the known keystream value, given that all of the twelve possible input 
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combinations are incorrect. The situation can be approximated by the binomial 
distribution. Therefore, 

P(incorrect guess detected) « (.9961)^^ = 0.9541 

Since the probability that no predicted output byte matches the actual 
keystream byte, given that the guessed contents of High8A{i), High8B{i) and 
High8K{i) are incorrect, is 0.9541, the probability that at least one predicted 
output byte matches the actual keystream byte, given that the guessed contents 
of High8A{i), High8B{i) and High8K{i) are incorrect, is 0.0459. The probabil- 
ity of a false alarm is less than five percent, and the nice attribute of a false 
alarm is that once we are on the wrong track, we have a 0.9541 probability of 
detecting this at each step. 

Using the binomial distribution to calculate approximate probabilities, given 
the guessed bits are incorrect, 

P(keystream byte matches 1 prediction) « ^^^^^(0.0039)^(0.9961)^^ = 0.0448 

P(keystream byte matches 2 predictions) « ^^^^(0.0039)^(0.9961)^° = 0.0010 

P(keystream byte matches > 2 predictions) « 0.0001 

From this, we conclude that most of the time, we will generate very few false 
trails — typically just one or two. Thus, we perform a depth-first search of the 
possible states, but we seldom spend much time on a false trail. 

Note that if we have the correct states for High8A{i), High8B{i) and 
High8K{i) we never mistakenly think we have the wrong state. Once we iden- 
tify the correct High8yi(l), High8s(l) and High8/y(l), we can quickly find the 
correct states for High8yi(i), High8s(z) and High8x(f), for 2 < z < n for some 
n > 25. From these we can reconstruct the initial states of the three LFSRs. 



4.2 Effect of Length of Known Keystream 

The minimum length of keystream required for this attack to be successful is 25 
bytes; one byte to obtain the required eight bit value for Pzg/z8/y(l), giving a 
known eight bits in each of the three 32-bit LFSR initial states, and then one 
byte to recover each of the other 24 bits in the three LFSR initial states. The 
more keystream available, the more certain we are of successful reconstruction. 
However, if we have less than 25 bytes of known keystream, the attack can still 
be performed as outlined above to give a likely reconstruction of most of the 
LFSR states, and we use exhaustive search over the contents of the last few 
stages. 




Cryptanalysis of ORYX 



303 



N 


25 


26 


27 


% Success 


99.7 


99.9 


100.0 



Table 1. Success rate (%) versus N. 



5 Experimental Results 

The performance of the attack was experimentally analyzed to find the the pro- 
portion of performed attacks for which the initial states can be successfully 
recovered, for the keystream lengths N = 25, 26, and 27. For each keystream 
length, the attack was performed one thousand times, using pseudorandomly 
generated LFSR initial states. The attack was considered successful if the re- 
constructed LFSR initial states were the same as the actual LFSR initial states. 
Tabled shows as the success rate the proportion of attacks conducted which 
were successful, for each value of N. 

From Table J we observe that even for the minimum keystream length, 
N = 25, the attack is usually successful. In a small number of cases, there 
exist multiple sets of LFSR initial states which produce the required keystream 
segment and the attack cannot identify the actual states used. However, as noted 
in Section^Honly a small increase in keystream length is required to eliminate 
these additional candidates. 



6 Ciphertext-only Attacks 

In many cases, the known-plaintext attack on ORYX can be extended to a 
ciphertext-only attack if some knowledge of the plaintext statistics is assumed. 
For example, when the plaintext is English text or other stereotyped data, 
ciphertext-only attacks are likely to be feasible with just hundreds or thousands 
of bytes of ciphertext. 

To perform a ciphertext-only attack we start by identifying a probable string 
of at least seven characters; a word or phrase which is likely to appear in the 
plaintext. Examples of probable strings include “login :u” and “.uuTheu”. We 
then slide the probable string along each ciphertext position, hoping to find a 
“match” with the correct cleartext message. 

If we align a probable plaintext string correctly, then we obtain a segment of 
the keystream with length equal to the length of the probable plaintext string. 
The known-plaintext attack described above can be performed on this keystream 
segment. If every path of guesses is ruled out by the end of the N = 7 bytes of 
known text, then we know the probable string does not match the cleartext at 
this position. Otherwise, we conclude that we have found a valid match; this may 
sound optimistic, but we show next that the probability of error is acceptably 
low. 
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With this procedure, false matches should be rare, because false paths of 
guesses are eliminated very quickly. After analyzing the first byte, 2^® possibil- 
ities for the high bytes of each register remain. From Section only 0.0459 
of the wrong possibilities remain undetected after the second byte; of those, 
the proportion which remain undetected after the third byte is 0.0459; and so 
on. This means that only 2^® • (0.0459)® = 0.00061 w 2“^®-^ wrong possibilities 
are expected to survive the tests after N = 7 bytes of known text are consid- 
ered, on average. The probability of a false match being accepted is at most 
0.00061 w 2“^®-^. Therefore, with less than a thousand bytes of ciphertext, we 
expect to see no false matches, for probable strings of length N = 7. Using a 
slightly longer probable word will further reduce the probability of error. 

The search for ciphertext locations which yield a valid match with the prob- 
able word can be performed quite efficiently. It should take about 2^® work, on 
average, to check each ciphertext position for a possible match. With less than 
a thousand bytes of ciphertext, the total computational effort to test a probable 
word is less than 2^®, and thus even a search with a dictionary of probable words 
is easily within reach. 

Next, we describe how to use matches with the probable word to recover 
the ORYX key material. Each valid match provides 8 -I- {N — 1) = 14 bits of 
information on the initial states of LFSR^ and LFSRk, and 8-1-1. 5- (fV — 1) = 17 
bits of information on the initial state of LFSRb. Therefore, with three probable 
word matches, we will have accumulated about 42 bits of information on each 
of the 32 bit keys for LFSRyi and LFSR/y, and 51 bits of information on the 32 
bit key for LFSRb. The key can be easily recovered by solving the respective 
linear equations over GF{2). Alternatively, with two matches, we have 28 bits 
of information for LFSRyi and LFSRx, and 34 bits of information for LFSRs. 
An exhaustive search over the remaining eight unknown bits should suffice to 
find the entire key with about 2® trials. For each key trial, we can decrypt the 
ciphertext, and check whether the result looks like plausible plaintext by using 
simple frequency statistics or more sophisticated techniques. 

As long as we have enough ciphertext and can identify some set of probable 
words, it should be easy to find two or three matches and thus recover the entire 
ORYX key. In other words, it appears that even ciphertext-only attacks against 
ORYX have relatively low complexity, when some knowledge of the plaintext 
statistics is available. The computational workload and the amount of ciphertext 
required are modest, and these attacks are likely to be quite practical. 

7 Summary and Conclusions 

ORYX is a simple stream cipher proposed for use as a keystream generator 
to protect cellular data transmissions. The known plaintext attack on ORYX 
presented in this paper is conducted under the assumption that the cryptanalyst 
knows the complete structure of the cipher and the 96-bit secret key is only the 
initial states of the component LFSRs. The attack requires exhaustive search 
over 16 bits, and has over 99 percent probability of success if the cryptanalyst 
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knows 25 bytes of the keystream. The probability of success is increased if the 
cryptanalyst has access to more than 25 bytes of the keystream. In our trials, 
a keystream length of 27 bytes was sufficient for the attack to correctly recover 
the key in every trial. Furthermore, we have shown how to extend this to a 
ciphertext-only attack which is likely to be successful with only hundreds or 
thousands of bytes of known ciphertext. 

These results indicate that the ORYX algorithm offers a very low level of 
security. The results further illustrate the low level of security offered in most 
second generation mobile telephone devices. The authors are of the opinion that, 
in most cases, this is due to the lack of public scrutiny of the cryptographic algo- 
rithms prior to their adoption for widespread use. It is to be hoped that the past 
reliance on security through obscurity will not be repeated in the cryptographic 
algorithms to be used in the third generation of mobile communications systems, 
due for use early in the twenty-first century. 
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Abstract. This paper describes a timing attack on the RC5 block en- 
cryption algorithm. The analysis is motivated by the possibility that 
some implementations of RC5 could result in the data- dependent rota- 
tions taking a time that is a function of the data. Assuming that encryp- 
tion timing measurements can be made which enable the cryptanalyst to 
deduce the total amount of rotations carried out during an encryption, 
it is shown that, for the nominal version of RC5, only a few thousand 
ciphertexts are required to determine 5 bits of the last half-round subkey 
with high probability. Further, it is shown that it is practical to deter- 
mine the whole secret key with about encryption timings with a time 
complexity that can be as low as 2^®. 

Keywords: Cryptanalysis, Timing Attacks, Block Cipher. 



1 Introduction 

RC5 is an iterative secret-key block cipher invented by R. Rivest It has 
variable parameters such as the key size, the block size, and the number of 
rounds. A particular (parameterized) RC5 algorithm is designated as TiCb-w/r/h 
where w is the word size (one block is made of two words), r is the number of 
rounds, and b is the number of bytes for the secret key. Our attack works for 
every choice of these parameters. However, we will focus on the “nominal” choice 
for the algorithm, RC5-32/ 12/16, which has a 64-bit block size, 12 rounds, and 
a 128-bit key. 

The security of RC5 relies on the heavy use of data-dependent rotations. 
The application of the two powerful attacks of differential and linear cryptanal- 
ysis to RC5 is considered by Kaliski and Yin who show that the 12-round 
nominal cipher appears to be secure against both attacks. In Q, Knudsen and 
Meier extend the analysis of the differential attacks of RC5 and show that, by 
searching for appropriate plaintexts to use, the complexity of the attack can be 
reduced by a factor of up to 512 for a typical key of the nominal RC5. As well, 
it is shown that keys exist which make RC5 even weaker against differential 
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cryptanalysis. Recently, in Q, new differential cryptanalysis results imply that 
16 rounds are required for the cipher with w = 32 to be secure. The results on 
linear cryptanalysis are refined by Selcuk in Q and, in it is shown that a 
small fraction of keys results in significant susceptibility to linear cryptanaly- 
sis. Despite these results, from a practical perspective RC5 seems to be secure 
against both differential and linear cryptanalysis. 

In [J], Kocher introduces the general notion of a timing attack. The attack 
attempts to reveal the key by making use of information on the time it takes to 
encrypt. The applicability of the attack on asymmetric systems is demonstrated 
by examining timing variations for modular exponentiation operations. As noted 
by Kocher, implementations of RC5 on processors which do not execute the 
rotation in constant time are at risk from timing attacks. We will show that for 
implementations where the rotations take a variable amount of time, linear in 
the number of left shifts, RC5 is vulnerable to timing attacks which recover the 
extended secret key table with only ciphertexts from the sole knowledge of 
the total amount of rotations carried out during encryption. 

2 Description of Cipher 

RC5 works as follows: the secret key is first extended into a table of 2r-l-2 secret 
words Si of w bits. We will assume that this key schedule algorithm is rather 
one-way and will therefore focus on recovering the extended secret key table and 
not the secret key itself. The description of the key schedule can be found in 

Let {Lo,Rq) denote the left and right halves of the plaintext. Then the en- 
cryption algorithm is given by: 



Li = Lq + So 

R\ = Ro + ‘S'l 

for z = 1 to 2r do (1) 

Li-\-l — Ri 

Ri+1 — {{Li 0 i?i) < — Ri) Si-i-i 

where “T” represents addition modulo-2“’, “0” represents bit-wise exclusive-or, 
and “A ^ T” is the rotation of X to the left by the log 2 w least significant bits 
of Y. For example, if w = 32, X is rotated to the left by the number of positions 
indicated by the value of Y mod 32. 

The ciphertext is (L 2 r+i,R 2 r+i)- The transformation performed for a given 
i value is called a half-round: there are 2r half-rounds. Each half-round involves 
exactly one data-dependent rotation and one sub- key Si. 

To decrypt, the operations of the algorithm must be appropriately reversed 
to generate the data for each half-round by essentially going backwards through 
the encryption algorithm. For example, data is rotated right and the subkeys 
are applied by subtracting modulo-2"’ from the data. 

In section 3 hereafter we describe the foundations of the timing attack and 
give some preliminaries and in section 4 we describe our timing attack as it is 
used to obtain log 2 w bits of the last half-round subkey. In section 5, we discuss 
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how to derive the remaining subkey bits and, in section 6, we present some 
experimental results on the likelihood of the success of the attack. Finally, in 
section 7, we discuss the complexity of the attack. 

3 Preliminaries 

In this section, we describe our assumptions for the timing attack and show how 
to correlate the total amount of rotations carried out during encryption with the 
value of the second last rotation. 



3.1 Timing attacks 

For a complete description of Kocher’s timing attacks we refer to Q. The main 
idea is to correlate the time measurements for processing different inputs with 
the secret exponent in RSA or Difhe-Hellman like systems. Since for every non- 
zero bit in the secret exponent the whole process performs an extra modular 
multiplication which depends on some computable intermediate value, correla- 
tions can be found between the variations of the time measurements on some 
sample space and the fact that an exponent bit is set or not. This way, as the 
distributions become more accurate, more and more bits can be derived and 
finally the whole secret key can be recovered. 

In symmetric- key cryptosystems, things tend to get more complicated as 
usually only constant-time operations such as additions, exclusive-ors or table 
look-ups are performed. Nevertheless, under certain assumptions, given cryp- 
tosystems like RC5 can become vulnerable to timing attacks as well. 



3.2 Hypothesis 

Rivest notes that “on modern microprocessors, a variable rotation . . . takes con- 
stant tim€^ but there are certain types of processors which do not have this 
property. For instance for 8-bit microcontrollers on smart cards, or in other con- 
strained environments, the rotation has to be performed step by step, one left 
or right shift at a time. It is not necessarily optimal to swap bytes or nibbles 
depending on the number of left shifts, as testing the rest modulo 16, 8 or 4 of 
this number may take more time than doing all the shifts no matter what. When 
machine cycles are measured during encryption, we can deduce from a certain 
amount of measurements how long the constant-time operations take, and how 
long the variable-time operations take: as these only concern rotations for RC5, 
we can deduce what the total amount of rotations is for every encryption. We 
believe a specific analysis can also endanger the algorithm if rotations are car- 
ried out in a non-linear but still variable amount of time. As well, it should also 
be noted, that a naive, straightforward hardware implementation could also re- 
sult in rotation times that are a function of the cipher data and, hence, create 
susceptibility to the timing attack we describe in this paper. 
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In this paper, we focus on the case where we can assume that a left rotation 
by an amount y takes y times longer than a left rotation by an amount of 1 . 
We show a ciphertext-only attack that recovers the extended secret key in a 
reasonable amount of time when the total number of rotation amounts is given. 
Kocher has already mentioned that “RC 5 is at risk” and we show why. Note 
that the imperfectness and inherent randomness of timing measurements may 
cause the complexity to grow and the required number of ciphertexts to be 
much higher, but as the theoretical attack we present is of such low complexity, 
it should raise serious doubts about the strength of RC 5 , if implemented such 
that the rotation times are data dependent. 



3.3 Foundation of the Attack 

Let T2r denote the total rotation amount for the encryption of a given plaintext. 
T2r is given by: 



2r 

T2r = mod w) ( 2 ) 

i=l 

We note that the amount of the last rotation is known because it is the value 
of the log2 w least significant bits of L2r+i which is the left half of the ciphertext. 
Therefore let us consider the total amount of rotations minus the last rotation. 
We denote this quantity by T2r-i and 

2r 

T2r-i = mod w) — {L2r+i mod w). ( 3 ) 

i=l 

More generally, for a half-round k, we can define the intermediate amount of 
rotations so far and we have 



k 



Tfc = Ri mod w 

i=l 


( 4 ) 


and 

k 




Tfc-1 = Ri mod w — (Lfc+i mod w). 

i=l 


( 5 ) 


We also have the following: 




Tfc_i = Tfc_2 - 1 - Rk-i mod w 


( 6 ) 


Now let us consider the way the total amounts of rotations are 
over some large sample space. Tk-2 can be represented by its mean 


distributed 
value {k — 



2 ) X added to some deviation. If this deviation (noise) is small, Tk-i and 
Rk-i mod w are correlated at each half-round and the distribution of Tfc_i gives 
us a good idea about what the distribution of mod w should look like. 
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This is the observation that leads to our timing attack. Knowledge of the 
second last rotation amount gives us some knowledge about the last subkey. We 
describe the attack in detail in section 4 hereafter. 

4 Our Attack 

In this section we describe the first part of our attack, which is how to derive 
the log 2 w least significant bits of the last subkey, S' 2 r+i ■ We shall describe two 
approaches to extracting the log 2 w least significant bits of the last subkey. We 
shall refer to these approaches as Method A and Method B. (Both methods were 
derived independently and are further elaborated in Q for Method A and Q for 
Method B.) The fundamental difference between the approaches is the indicator 
used as a metric to pick the most likely subkey. 

4.1 Method A 

In the last section, we showed how the total amount of rotations Tk-i and a 
given rotation Rk-i mod w are correlated. Now we need an indicator to be able 
to qualify this correlation. A quite natural choice consists in using the following 
I correlation coefficient as an indicator: 

w — 1 

7 = A{(Tfc_i - mod w — )} (7) 

where Hk-i is the mean of the distribution of the Tk-iS over some sample space. 

In fact, it is even more convenient to use only the sign of (i? 2 fc mod w — 
in order to partition the samples depending on the deviation from their mean 
value. The indicator we shall therefore be using is the following: 

w — 1 

I = E{{Tk-i - ^J,k-l) X Sign{Rk-i mod w — )} (8) 

In the case of two correlated distributions, this indicator is expected to have 
a higher absolute value than in the case of two independent distributions (when 
our guess about the second last rotation amount is wrong). 

The first phase of the attack is a sample generation phase. We collect triplets 
corresponding to a plaintext encrypted under the unknown key and its cipher- 
text, and T 2 r-i, the total amount of rotations minus the last one carried out 
during encryption. These samples are stored in a table and are ordered by the 
value of the log 2 w least significant bits of the left half of the ciphertext. Recall 
that this also is the value of the last rotation amount. 

We denote by N the number of collected samples of total rotation amounts 
in each category. At each half-round we assume that the intermediate rotation 
amounts are uniformly distributed, independent random variables. For our anal- 
ysis to work, we need the standard deviation of our sample space to be negligible 
over all half-rounds and all samples. 

Let Xi be the rotation amount of the i-th sample at an intermediate half- 
round. Over all half-rounds and all samples, we have: 
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Var{2r x Xi) = 2r x Var{Xi) = 2r x Var{Xo) 



and 



N 

Var{2r x'^Xi) = 2r x N x Var{Xo) 

i=l 

The standard deviation a is given as: 



2r X N X 



w 



12 



(9) 

( 10 ) 



^rxEl^X, 
N 



= Var{y) = V ar{ 



■) 



2r 

]V ^ 12 



( 11 ) 



An order of magnitude of the number N of samples needed is given by the 
condition: 



cr<l (12) 

which gives us the following condition for N-. 

2 

fV>2rx^ (13) 

Therefore in each category we need N to be much greater than 2^^. From 
a practical point of view, we implemented our attack with 2^® samples in each 
category ; there are w different categories, therefore the total number of samples 
required is 2^® x w = 2^°. We will keep this upper bound in our complexity eval- 
uation in section 7 as it is convenient for practical implementations. (Actually, 
the practical attack requires quite more time and plaintexts than suggested by 
the theory, so this approximation fits much better to the experiments.) 

Now we have: 



R2r+1 — {{R2r-1 © L2r+l) ^ L2r+l) + S2r+1 (14) 

In this last equation, R 2 r+i is the right half of the ciphertext and T2r-i-i is 
the left half of the ciphertext. Therefore the right value of i? 2 r-i gives us the 
right value of S' 2 r-i-i . 

We concentrate on the category of samples such that the log 2 w least signif- 
icant bits of the left half of the ciphertext are equal to zero. In particular, this 
means that the last rotation amount is also equal to zero. Therefore we have the 
following equation: 



(i?2r-|-l — <S'2r4-l) mod W = i?2r-l mod W (15) 

For each possible value of the log 2 w least significant bits of S' 2 r-i-i; we com- 
pute the supposed second last rotation amount i? 2 r-i mod w for each sample in 
the category we concentrate on. This gives us a trial distribution over the 2^® 
samples. 

Now divide the samples into two parts depending on the sign of the guessed 
rotations. Compute the correlation coefficients and I~ on each of the two 
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parts. On the negative part, the correlation coefficient is supposed to be neg- 
ative, and on the positive part, it is supposed to be positive. Finally compute 
the quantity /+ — /“. This indicator should have a higher value when the two 
distributions are correlated then when they are independent. The right value of 
the log 2 w least significant bits of the last subkey is suggested by the highest 
indicator. 

4.2 Method B 

In this section, we describe another approach to extracting the log 2 w least sig- 
nificant bits of the last subkey and, using a probabilistic model we get an es- 
timate of the number of ciphertexts required to determine the bits with high 
probability. For convenience, we shall strictly focus our attention on the cipher 
with w = 32 and r = 12. Similar to the previous section, the model assumes 
that the rotations in each half-round are independent random variables that are 
uniformly distributed over {0, 1, 2, . . ., 31} with a mean of 15.5 and a variance 
of 85.25. Under these assumptions, the sum of the number of rotations for the 
first 22 half-rounds of the cipher, T 22 , is a random variable with a mean of 
/i 22 = 22 • 15.5 = 341 and variance of a ^2 — 22 • 85.25^ = 1875.5. As well, based 
on the central limit theorem, T 22 is approximately Gaussian distributed. 

To determine the correct value of the partial subkey 525 mod 32, a number 
of ciphertexts is used to test each possible value for the 5 bits and to determine 
which value is most consistent with the expected statistical distribution of the 
timing information. In particular, ciphertexts for which L 25 mod 32 = 0 are 
used to compute an estimate of the variance of T 22 based on the value of each 
candidate partial subkey: it is expected that the variance estimate when the 
correct value for the partial subkey is selected will be smaller than the estimate 
when an incorrect partial subkey is assumed. 

Let K represent the actual key bits 825 mod 32 and let K represent the guess 
of the partial subkey K. The candidate key K can be represented by 

iL = (iL -t- r) mod 32 (16) 

where —15 < r < 16. The estimate of the variance of T 22 for a particular 
candidate key K is given by 

<t>r = E{el,) (17) 

where Cr,x represents the difference between the measured number of rotations 
for the entire 24 half-rounds and the expected number of rotations given the 
assumed candidate key for a ciphertext with R 25 mod 32 = x. The difference 
Cr,x is composed as follows: 

^T,X = {T 2 I + R) — (/i22 + Rt,x) (18) 

where R represents the actual value of the rotation in the 23rd half-round (i.e., 
R — R 23 mod 32) and Rr^x represents the guess of the rotation in the 23rd half- 
round (corresponding to the candidate key K) given R 25 mod 32 = x. Using 
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ciphertexts for which L 25 mod 32 = 0, the size of the rotation in the 24th half- 
round is 0. Therefore, T 22 + R equals the total number of rotations, which, 
of course, can be derived from the timing information. The value of Rt,x is 
determined from Rt^x = (x — K) mod 32. Hence, for a given ciphertext and 
candidate key, the value of 6 r,x can be calculated. We can also view equation 
^3 by letting AT = T 22 — ^J -22 and ARr^x = R — Rt,x and we get 

Gt,x = at + ARx^x- (19) 

Now assume that the cryptanalyst has available N ciphertexts for which 
L 25 mod 32 = 0 and, hence, for large N, the number of ciphertexts corresponding 
to a particular value x for R 25 mod 32 is given by Nx « N/32. For each ciphertext 
and candidate key K, €t,x is computed and the mean of the square of er,x is 
calculated. The result is equivalent to 

31 No: 

= ( 20 ) 

x— 0 i —1 

where ATx,i represents the i-th value of AT for i? 2 s mod 32 = x. For the cor- 
rect guess of the key (i.e., r = 0), ARx^x = 0 since Rr,x = R and, hence, 
4 >o = (1/-^) Sx=o For incorrect candidate keys, for which |r| > 1, 

ARr^x 7 ^ 0, and it can be shown that E{(f>x} > E{4 >q]. The cryptanalyst can 
therefore collect ciphertexts and timing information (implying rotation informa- 
tion) and determine the key K by picking the candidate key K which minimizes 
(j)x- 

We now determine the probability that an incorrect key is selected over 
the correct key. For this to happen, we must have (pr < 4>o or, alternatively, 

— < 0- Hence, the cryptanalyst must acquire enough ciphertext timings to 

ensure that 4>t — 4>o > 0 for all r 0. From it can be seen that 

31 

+ {ARr,x)\ (21) 

x — 0 2—1 

We can consider (pj- — (j)Q to be a Gaussian distributed random variable with an 
expected value given by 



E{<j2x - M ^ ^^{ARr,xf (22) 

a ;— 0 

where we have used Nx ~ A"/32 and E{AT"\ = 0. As well, (j)r — 4>o has a variance 
given by 



31 Mx 



31 



Var{ci>r - </>o} = ^ E E[(2^^...)^'^22] « ^ E(^^-.-) 



X—0 i—1 



8 N 



(23) 



X—0 
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It can be shown that 

= |r|(32 - |r|)2 + (32 - |r|)|rp = 32|r|(32 - |r|) (24) 

a :— 0 

and, consequently, it can be easily verified that 

maxP((()^ - (()o < 0) = P{4>ui - < 0) (25) 

r 

where w = — 1 or +1. For N — 2000 ciphertexts for which L 25 mod 32 = 0, 
based on the Gaussian distribution, we get P{(j)i — (j)o < Q) = 0.0021 and the 
probability of being able to determine the correct 5 bits of subkey, 825 mod 32, 
is given by 



P(525[4 ... 0] correct) = 1 - P{3K\t M (26) 

where 

P(3.P|r M < 31 • P((/-i - ((.0 < 0) = 0.0651. (27) 

Therefore, the probability of picking the correct 5 bits of subkey is greater than 
93.5% with 2000 ciphertexts under the assumption that the rotations in all half- 
rounds are independent. Note that the ciphertexts must be chosen such that 
L 25 mod 32 = 0, which is true on average for 1 in 32 random ciphertexts. Hence, 
the correct 5 bits of subkey can be derived with high probability using about 
64000 random ciphertexts and their timings. 

As we shall see in section 6, in fact, there are some dependencies in the 
rotations of different half-rounds which result in the probabilities of successfully 
deriving the key in practice being somewhat lower than expected from the model. 
Nevertheless, experimental evidence confirms that the approach works well when 
applied to the actual cipher. 

5 Deriving the Remaining Snbkey Bits 

In the previous section, we illustrated how it is possible to determine log 2 w bits 
of the last half-round subkey S' 2 r-i-i with high probability using a set of random 
ciphertexts and their timing information. Fortunately, it is straightforward to 
apply the techniques on the same ciphertexts to determine the remaining bits of 
subkey S' 2 r-i-i and, with enough ciphertexts, it is possible to derive all the bits 
of the subkeys Si,3 < i < 2r, as well. 

First, now that we have found the log 2 tc least significant bits of S' 2 r-i-i 7 we 
have to derive the w — log 2 w other bits of the last subkey. We proceed the 
following way: 

Based on the categories described in section 4.1, concentrate on one category 
of samples at a time, in increasing order. Depending on the value of the log 2 w 
least significant bits of the left ciphertext, the right half was rotated by some 
amount to the left. Therefore the right value of i? 2 r-i mod w gives us the right 
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value of the log 2 w bits of the last subkey which correspond to the log 2 w bit 
positions i = [L 2 r+i mod zu] through i = [L 2 r+i mod w + log 2 w — 1]. 

However, we proceed bit by bit in order to take advantage of the knowledge 
we already have on the least significant bits of the subkey. For each new rotation 
amount from 1 to w — log 2 w, try each of the two possible values of the log 2 w 
concerned bits of the subkey: take the log 2 w— 1 bits you already know and guess 0 
or 1 for the next bit. This gives you only two possible distributions for i? 2 r-i mod 
w in each category of samples. Using the indicator of either Method A or Method 
B determine the targeted key bit. The right value of the log 2 w concerned bits 
of the last subkey is suggested by the corresponding best indicator. 

Once S' 2 r+i is derived, the remaining subkeys associated with each half-round 
i,2 < i < 2r — 1, may be determined using the same set of ciphertexts. Once 
the subkey for a half-round i is determined, the ciphertext may be partially 
decrypted for one half-round so that the output of half-round i — 1 is known. 
Correspondingly, the timing of the partial encryption of the first i—1 half-rounds 
may be determined by subtracting the time to execute the i-th half-round from 
the time to encrypt the first i half-rounds of the cipher. The new ciphertext and 
timing information may then be used to extract the subkey for half-round i — 1 
in exactly the same manner as for the subkey of half-round i. 

The remaining subkeys, Sq, and S 2 , are applied to the cipher by addi- 
tion to the plaintext left half, plaintext right half, and the output of the ro- 
tation operation in the first half-round. All three of these subkeys cannot be 
determined using timing information but are trivially determined using only a 
modest number of known plaintexts and ciphertexts: S\ is simply determined 
using one known plaintext and using the relationship = L 2 — Rq, S 2 can be 
determined with a modest number of known plaintexts using, for example, linear 
cryptanalysis Q, and So can be easily derived once Si and S 2 are determined 
using So = [((i ?2 - S 2 ) L 2 ) © L 2 ] - To- 

6 Experimental Results 

In this section we present the experimental results which validate the effective- 
ness of the attack. Both Methods A and B assume that the values of the rotations 
in different half-rounds are uniformly distributed, independent random variables. 
This assumption, however, is not strictly correct. Consider, for example, the fol- 
lowing scenario for w = 32 and r = 12: ^24 = 0 and 525 mod 32 = 0. Suppose 
the cryptanalyst is attempting to determine ^25 mod 32 and is therefore con- 
sidering ciphertexts for which L 25 mod 32 = 0. If R 25 mod 32 = a; = 16, then 
R 22 mod 32 = (T 25 ^ 16) © 16 and, since L 25 — > 16 is a uniformly distributed 
random variable, R 22 mod 32 behaves as anticipated by the model. However, if 
i ?25 mod 32 = a; = 0, then R 22 mod 32 = 0 always and R 22 mod 32 is not a 
uniformly distributed random variable as suggested by the model. 

These discrepancies from the model add inaccuracies to the process of sta- 
tistically deriving the subkeys for both Method A and Method B. However, ex- 
perimental results for Method B demonstrate that the cryptanalytic technique 
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Number of 
Random Ciphertexts 


Probability of Sueeess 


S 25 mod 32 


S 25 


S 3 .. . S 25 


W 


0.611 


0.083 


0.000 


10® 


0.893 


0.794 


0.009 


10® 


0.901 


0.827 


0.024 



Table 1. Experimental Results for 1000 Keys on RC5-32/12/16 (Method B) 



is still very effective and the statistical model of section 4.2 provides a rough ap- 
proximation of the effectiveness of the attack to determine 5 key bits of the last 
half-round subkey. Using Method B, for 2000 ciphertexts with L 25 mod 32 = 0 
(equivalent to about 64000 random ciphertexts), experiments on the nominal 
RC5 for 1000 random keys resulted in an 86.2% chance of the partial subkey 
S 25 mod 32 being correctly determined. Using 64000 random ciphertexts, the 
complete subkey 525 was correctly determined for 69.7% of the keys. 

The effectiveness of the timing attack as determined in experiments using 
Method B is further illustrated in Table | It is clear from the table that few 
random ciphertexts are required to determine the bits of the last half-round sub- 
key with a high probability. The correct derivation of all subkeys S 3 . . . S 25 does 
not occur with nearly as high a probability: even modest deviations in prob- 
ability from 1 when determining subkeys significantly reduces the probability 
that all 23 subkeys will be successfully determined. However, it is apparent that 
the attack can be very effective in determining subkeys for a large fraction of 
keys and should be seriously considered to ensure that an implementation of the 
cipher is not vulnerable. Note that the probability of the complete success for 
the attack can be improved upon using a key path search approach described in 
the following section. 

7 Complexity of the attack 

In this section, we discuss the complexity of the attack in the context of Method 
A (although much of the discussion applies equally well to Method B). The first 
part of the attack is the sample generation phase. We need around 2^® ciphertexts 
in each category. There are w categories. Therefore the complexity of this phase 
is 2^® X w encryptions. The ordering has no extra cost. 

The second part of the attack is divided into four steps as mentioned in 
section 4. From a complexity point of view, there are 2r — 2 half-rounds to be 
considered. For each half-round: 

— start by computing the mean of the total rotation amounts in each category 
and by subtracting this mean to each total timing. This step takes 2^^ x w 
operations. 

— then, computing the log 2 w least significant bits of a subkey takes 2^® x w 
complexity. 
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— each further bit takes 2^® x 2 operations, and there are w — log 2 w such bits. 

— finally, before entering the next half-round, decrypt one half of the ciphertext 
with the subkey just found, and reorder all the categories by the log 2 w least 
significant bits of this new half of the ciphertext. At the same time, subtract 
the value of the current last rotation from the total rotation amount for each 
ciphertext. This whole step takes 4 x 2^® x w operations. 

Thus the total complexity for the last 2r — 2 half-rounds is: 

C= (2r-2)[8 X to X 2^®] (28) 

As an example, for RC5-32/12/16 this complexity equals C = 22 x 2^^. As for 
the last four subkeys, the cost is almost negligible. 

In conclusion, the attack can be carried out with the encryption of only 
2^® X to plaintexts and the time complexity of the analysis phase is roughly 
equivalent to searching the last 2r — 2 subkeys. The total complexity is: 

C = (2r - 2) [8 X w X 2^®] = (2r - 2) x 2 i®+'°S 2 ^ (29) 

These results were all checked by computer experiments based on Method A. 
In fact, the complexity is slightly higher because the right value of the log 2 w least 
significant subkey bits is not always suggested by the best indicators. Sometimes 
the second or third-best indicator corresponds to the right subkey. Thus, when 
implementing the attack, we had to make some complexity tradeoffs. However, 
when the top indicators are quite close, trying the 8 most likely paths is still 
possible. This only applies while searching the log 2 w least significant subkey 
bits; all other bits are always correctly guessed. 

Experiments show that, using Method A, when the guess of the log 2 w least 
significant bits is wrong, along the next 4 half-rounds the indicators tend towards 
a characteristical pattern. Therefore the right key path can still be made out with 
little extra effort. 

Considering RC5-32/12/16, for example, if there is no “wrong” indicator, the 
best complexity is around 2^® for one key search. On average, no more than 8 
subkeys are expected to lead to a partial exhaustive path search. For every such 
key, searching through the 8 keys associated to the 8 best indicators over two 
half-rounds leads to the right result. Nevertheless, for the subkeys Sr through 
^ 4 , this path search cannot apply anymore. Therefore we exhaustively search 
through the 8 best subkeys for these four half-rounds. The overall extra work 
factor should not exceed 8 x 8^ -I- 8^ = 2® -|- 2^^. However, this leads to an upper 
bound on the complexity which is far from being optimal. 

In summary, the overall complexity of our attack does not exceed 2^° op- 
erations for RC5-32/12/16. On average, the complexity is quite lower though. 
It is more realistic to consider the best case where the analysis takes only 2^® 
operations. 

In the general case, the complexity of the attack does not exceed: 

= (2r-2)x2®°+'°82- (30) 
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On average it is closer to the theoretic complexity 

C= (2r-2) X (31) 

and does not require more than N^ax = ciphertexts and their associ- 

ated timings, as well as a few known plaintexts to derive the last subkeys. 

8 Conclusion 

We have shown in some detail how to derive the extended secret key table of 
RC5-32/12/16 by a timing attack using only about 2^° ciphertext timings and 
in time complexity 2^® in the best case, and 2^° in the worst case. This confirms 
Kocher’s statement that RC5 is at some risk on platforms where rotations take a 
variable amount of time, and suggests to be very careful when implementing RC5 
on such platforms. Adding a random time to each encryption will not help as it 
will have little influence on the variance computations. Therefore we suggest to 
add the right amount of “dummy ’’rotations which will achieve a constant time 
for every encryption whatever the initial total amount of rotations. 
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Abstract. The cipher family SPEED (and an associated hashing mode) 
was recently proposed in Financial Cryptography ’97. This paper crypt- 
analyzes that proposal, in two parts: First, we discuss several troubling 
potential weaknesses in the cipher. Next, we show how to efficiently break 
the SPEED hashing mode using differential related-key techniques, and 
propose a differential attack on 48-round SPEED. These results raise 
some significant questions about the security of the SPEED design. 



1 Introduction 

In Financial Cryptography ’97, Zheng proposed a new family of block ciphers, 
called SPEED . One specifies a particular SPEED cipher by choosing param- 
eters such as the block size and number of rounds; the variations are otherwise 
alike in their key schedule and round structure. Under the hood, SPEED is built 
out of an unbalanced Feistel network. Zheng also proposed a hash function based 
on running a SPEED block cipher in a slightly modified Davies-Meyer mode. 

One of the main contributions of the SPEED design is its prominent use 
of carefully chosen Boolean functions which can be shown to have very good 
non-linearity, as well as other desirable theoretical properties. One might there- 
fore hope that SPEED rests on a solid theoretical foundation in cryptographic 
Boolean function theory. Nonetheless, this paper describes serious weaknesses in 
the cipher; many lead to practical attacks on SPEED. 

This paper is organized as follows. Section^ briefly summarizes the SPEED 
design. In Section^ we discuss some preliminary analysis of the SPEED design, 
including comments on differential characteristics in SPEED, and on the non- 
surjectivity of the SPEED round function. Then we shift emphasis: Section J 
discusses differential characteristics for SPEED with 48 rounds. There appears 
to be an obvious 1-bit characteristic with probability after 40 rounds; how- 
ever, this characteristic does not actually work. We discuss this and other failed 
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characteristics in Section ^ In Section ^ we describe how a differential attack 
can be mounted despite these problems. SectionHgives differential related-key 
attacks on block cipher and shows how to apply them to efficiently find colli- 
sions in the SPEED hash function. Section ^assesses the practical implications 
of these attacks, proposing a rule of thumb to characterize when we can con- 
sider a parameterized cipher family “broken.” Finally, we conclude this paper in 
Section 5 

2 Background 

SPEED is a parameterized unbalanced Feistel cipher with a variable block width 
w, variable key length I, and a variable number of rounds R {R must be a multiple 
of 4 and w a multiple of 8). The block is split up into eight equally-sized pieces: 
By, .. . ,Bq. The round function is then characterized by 

t{B'j , . . . , Bi, Bq) = {Bq, ... ,Bi, Bq, r(B'i') ffl itli ffl v{Fi{BQ , . . . , Bi, Bq))) 

where ffl denotes addition modulo 2”^/®, Ft is a round-dependent function with a 
w/8-bit output, u is a data-dependent rotation, Ki is a round-dependent subkey, 
and r is a bijective function from {0, 1}™/® to {0, 1}"’/® (r is always right rotate 
by w/16 — 1 by bits). See Figure J for one round of SPEED. 



Bi Bq Bq B4 Bq B2 Bi Bq 




Fig. 1. One round of SPEED. 
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We sometimes refer to the seven input pieces, Bq, . . . , Bq, as the source 
block and the modified piece, Bj, the target block. The number of rounds is a 
parameter; the paper suggests using at least 48 rounds for adequate security for 
the block cipher, and at least 32 rounds for the hash function. We assume that 
the underlying CPU operates on w/8-bit words. 

The round dependent function Fi takes seven w/8-bit inputs and produces 
one w/8-bit output. The function v takes a zu/8-bit input and produces a w/8- 
bit output by rotating the input a number of bits dependent on the input. The 
function that results from combining Fi and v will be denoted hi. 

hi{xe, ... ,xo) = v{Fi{xe, , xq)) 

The exact details of the rotation in v are not important to our analysis and 
may be found in In this paper, we will write to stand for the result of 
rotating a right by b bits. 

There are four different F functions Fi for i G {1,2,3, 4}, each is used for a 
different set of rounds. Each Fi is built out of a single (1-bit) Boolean function 
fi on 7 (1-bit) variables, which is extended to a w/8-bit function by bitwise 
parallel evaluation. In other words, bit position i of the output depends only on 
seven input bits, each at position i in one of the input words. For example, Fi 
is Fi{xe, . . . , a;o) = xqx^ © x^^xi © x^^x^ © x\Xo © a;o where XiXj denotes bitwise 
AND. SPEED uses F\ for the first i?/4 rounds, then F 2 for the next i?/4 rounds, 
and so on. 

In summary, each round of SPEED uses one evaluation of F, two rotations, 
and an addition modulo 2^"/® to update the block. 

3 Preliminary Analysis 

In this section, we begin analysis of SPEED with respect to potential cryptana- 
lytic attacks. 



3.1 Non- Surjectivity of the Round Ibinction 

The round function involves generating a word to add (modulo the wordsize) 
to the target block. However, due to a fiaw in the round function’s design, we 
can significantly limit the possible output values. All output values are possible 
from F. However, the data dependent rotation in v limits the possible values 
actually added to the target block. In other words, the combined function v o F 
is non-surjective. Rijmen et al ^ identified several attacks on ciphers with non- 
surjective round functions, so this property of SPEED is worrisome. 

Applying the attack of Q is not as straightforward as one might hope. There 
Rijmen et al depend on the fact that the range of the h- function is known 
and hence one can perform hypothesis testing based upon the combined output 
of several h- functions. However, for SPEED the output of each h- function is 
combined with a subkey and hence the range is unknown. Fortunately we know 
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the shape that the distribution should take and it would appear that one can 
modify the analysis to perform hypothesis testing using the shape of the entire 
distribution rather than individual values of the distribution. 

We have performed a preliminary analysis of a modified version of the cipher 
in which the data independent rotate is removed from each round. This allows 
us to write out an equation which is the sum of 6 subkeys (assuming a 48-round 
cipher), the outputs of 6 ft,- functions, part of the input block, and part of the 
output block. It appears that one simply performs 2™/® hypothesis tests, one 
for each possible value of the sum of the 6 subkeys, selecting the value which 
produces the closest distribution to the output of 6 ft-functions. 

Analysis gets substantially more difficult when the data independent rotate 
is taken into account, since the carry bits that “cross the rotation boundary” 
spoil naive attempts to isolate the sum of the ft-function outputs from the sum 
of the subkeys. Nonetheless, for larger word sizes the spread of the carry bits is 
limited. We conjecture that it may be possible to extend the technique to handle 
the data independent rotations in these cases, though the analysis will almost 
certainly be much messier. 

3.2 Implications of Correlated Outputs 

The outputs of successive round functions are correlated. For instance, 

Fi{xe, ... ,xo) = Fi{xr, ... ,xi) 

with probability 1/2-1- 1/32 over all choices of Xq, . . . ,xr- This shows that there 
are non-trivial correlations between successive outputs of Fi; this could poten- 
tially lead to correlations between successive outputs of ft. Similar correlations 
occur for F3 and F4, though this property does not seem to hold for F2. 

We have not been able to extend this troublesome weakness to a full at- 
tack on SPEED. Nonetheless, this gives another indication of how the SPEED 
design may get a great deal less strength than expected from the very strong 
Boolean functions chosen for it. If successive ft-function outputs are correlated 
strongly enough, this could make linear or differential-linear attacks possible on 
the cipher. We leave this as an open question for future research. 

3.3 Differential Characteristics 

The key insight necessary for mounting a differential attack on this round func- 
tion is that F is a very good nonlinear function, but it applies it to each bit 
position of each source block word independently. In other words, F exhibits 
very poor diffusion across bit positions. Therefore, we see immediately that flip- 
ping any one bit in the input of F can only affect one output bit of F. In 
particular, if the underlying Boolean function behaves “randomly,” flipping an 
input bit of F should leave its output unchanged with probability 1/2. 

This would appear, at first glance, to yield a very simple eight-round iterative 
differential characteristic with probability approximately 2“®. However, there is 
a problem. The straightforward attack doesn’t work; we explain why below. 
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Complications There are two complications to our differential attack. First, 
the Boolean functions aren’t really random, and so don’t have quite the prob- 
ability distribution we would have expected. Table J lists the probability that 
the output of F remains unchanged after dipping one input bit in the input at 
position i. 





0 12 3 


4 5 6 


Fi 


.5 .5 .5 .5 


.5 .5 .5 


F 2 


.5 .5 .5 .25 


.5 .5 .75 


F 3 


.5 .5 .5 .5 


.5 .5 .5 


Fi 


.5 .5 .5 .5 


.5 .5 .5 



Table 1. Probability that the output of Fi remains unchanged after Hipping one 
input bit. 



The second complication is much, much more problematic. SPEED is, in the 
terminology of Q, a source-heavy UFN. Furthermore, there is no key addition 
before the input of the F-function. This means that the inputs to the Feistel 
F-function in successive rounds can’t be assumed to be independent, as they 
generally can be in a balanced Feistel network or in a target-heavy UFN. 

If the inputs to successive rounds’ F-functions aren’t independent, then it’s 
possible that the success probabilities of each round’s differential characteristic 
are also not independent. In fact, taking this effect into account, we find that 
the probability for six of the eight possible eight-round (one-bit) iterative char- 
acteristics is precisely 0 — the inter-round dependence makes it impossible for 
the characteristic to penetrate more than eight rounds. When extended to an 
eleven-round characteristic (across F 2 ), the characteristic always has a proba- 
bility of 0. However, later in this paper we will show how to fix the attack by 
tweaking the differential characteristic. See Section^ 

The problem of inter-round dependence was mentioned as a theoretical possi- 
bility in I; here we give a practical example where it arises. The dependence also 
arises again in Section^ where precisely this difficulty complicates a related-key 
attack on the cipher. 

4 More on Differential Characteristics 

Differential characteristics are also possible with more than one bit at the same 
bit position set. The flipped bits rotate through different bit positions (due to the 
constant rotate in the round function), but end up in the same bit position for 
most rounds. We will discuss the use of such characteristics later in this paper. 

As we have already noted, the obvious 1-bit differential attack against the 
cipher does not work. The problem is that we will have a hard time ramming our 
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differential through four rounds where F 2 is the F-function used for each round. 
Suppose that we have the 3-round characteristic in Tabled starting with round 
i. 
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Table 2. Failed 1-bit Differential Characteristic. Arriij is the value of the dif- 
ference in data block j at the input of round i. 



The problem is that this characteristic is impossible and we will not have the 
desired differences after round z -I- 1 . The reason this characteristic always fails 
is that successive outputs of F 2 are correlated, as we saw in Section This 
means that the characteristic’s round probabilities are not independent, so we 
cannot simply multiply them to obtain the full probability. It is fairly easy to 
see that any 1-bit characteristic across 1 1 rounds of F 2 will necessarily have this 
3-round characteristic, and hence the differential fails to attack the cipher with 
44 or more rounds. 

Unfortunately when trying to find a differential characteristic that would 
work we found that even slightly more complicated differentials also failed. Con- 
sider the 2-bit (8 round) differential with probability given in Tabled We 
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Table 3. Failed 2-bit Differential Characteristic. 



found that repeatedly concatenating this differential causes it to fail after at 
most 12 rounds (assuming a 48-round cipher). In the next section we discuss the 
analysis we used to determine that both of the above differentials would fail. 
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4.1 A Failure Test for Differential Characteristics 

If we make the assumption that all subkeys are independent and random, then 
there is a rather simple analysis technique we can apply to determine when a 
characteristic might fail. The technique will never fail a legitimate characteristic, 
but it may give a false positive. Therefore, passing the test does not imply that 
a characteristic will work. 

The first observation is that with high probability, in characteristics such as 
those found in Tableland TableJ bit j of Arrii^-j will only be affected by bit j 
of the remaining mi^k and bit j of the subkey. Hence one can construct a state 
transition diagram in which each node contains the round number, bit j of the 
appropriate subkey, and bit j of each rm^k (for both halves of a pair satisfying 
the characteristic). We connect all nodes using directed arcs if one node is a 
legitimate successor of the other. This requires that the round numbers differ by 
1, the rrii^k satisfy the appropriate relations (as defined by the round function), 
and the output difference of the appropriate F- function applied to the rrii^k is 0. 

Once we have constructed such a graph we can view it as several different 
layers — one for each round of the characteristic. Clearly all edges originating 
from layer i in this construction will end in layer z+1. Therefore we can construct 
an adjacency matrix Ai for the transition from layer i to i + 1 and an overall 
adjacency matrix A = The test we propose is to check whether A is 

the zero matrix. If it is, there will be no transition from a starting state to an 
ending state which satisfies our characteristic. Therefore we can eliminate those 
characteristics for which A is the zero matrix. 

There is a small complication in that the last few rows of both of the charac- 
teristics we proposed above show non-zero bit differences in other bit positions. 
However, we observed that the remaining bits adjacent to bit j' (where B = 2^ ) 
are effectively random. Therefore to simplify our analysis we viewed the result- 
ing 40-round characteristic as several superimposed n-round characteristics. We 
made the simplying assumption that each characteristic was independent of the 
others and hence were only concerned that each of the characteristics was inde- 
pendently possible. This assumption seems well justified, especially given that it 
will not eliminate legitimate characteristics (and we are only presently concerned 
with eliminating bogus characteristics) . 

Using these techniques, we examined each the characteristics given in TableJ 
and TableH Our analysis found that each of these characteristics would not work 
so we were able to eliminate them. In fact, we analyzed the next most obvious 
2-bit characteristic given in TableObut found that it also fails. 

Fortunately, the differential given in Tabled did not fail. In fact, one can 
construct independent round keys, a plaintext, and a ciphertext for which the 
differential holds. However, there appears to be one small difficulty even with 
this differential in that it will not work for all keys. Further research may reveal 
a differential (or family of differentials) which will work against all keys. 
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Table 4. Another Failed 2-bit Differential Characteristic. 



r 


Ami, 7 


Ami, 6 


Ami, 5 


Ami,4 


Ami, 3 


Ami, 2 


Ami,i 


Avriifi 


P 


- 


0 


0 


0 


0 


0 


A 


A 


A 


- 


i 


0 


0 


0 


0 


A 


A 


A 


0 


1/2 


i+l 


0 


0 


0 


A 


A 


A 


0 


0 


1/2 


i+2 


0 


0 


A 


A 


A 


0 


0 


0 


1/2 


i+3 


0 


A 


A 


A 


0 


0 


0 


0 


1/2 


i+4 


A 


A 


A 


0 


0 


0 


0 


0 


1/4 


i+5 


A 


A 


0 


0 


0 


0 


0 


B 


1/8 


i+6 


A 


0 


0 


0 


0 


0 


B 


B 


1/4 


i+7 


0 


0 


0 


0 


0 


B 


B 


B 


1/2 



Table 5. A Partially Successful 3-bit Differential Characteristic. 



5 Mounting an Effective Differential Attack 

The important point that we should remember from SectionHis that in general, 
input differences with a small Hamming weight with large probability cause 
output differences with a small Hamming weight. Even if a pair does not follow 
one of the described characteristics, the Hamming weight of the output difference 
will usually be about the same as the Hamming weight of the input difference. 
This behavior is quite similar to that of RC2 ^ and RC5 |. Therefore we will 
use a variant of the differential attack that has been developed to attack RC5 
B and is also used on RC2 Q. 
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5.1 Differentials and Characteristics 

It has been observed Q that the effectiveness of a differential attack depends on 
the probability of a differential, rather than on the probability of a character- 
istic. Indeed, when applying a differential attack to a block cipher, we are only 
concerned with the values of the differences in the first and last few rounds. The 
intermediate differences are not important. The analysis of RC2 has shown that 
in ciphers with limited diffusion, the difference in probability between charac- 
teristics and differentials may be significant. 

We verified experimentally that there exist several 12-round differentials that 
go from a one-bit input difference to a one-bit output difference with significant 
probability, even for the cases where it is difficult to find a differential charac- 
teristic with nonzero probability (cf. Section^. These 12-round differentials can 
be combined to produce longer differentials. For example, the 48-round differen- 
tial with input difference (0, 0, 0, 0, 0, 40, 0, 0) (in base-16) and output difference 
(80, 0, 0, 0, 0, 0, 0, 0) (in base-16) has probability 2“®° (this holds exactly for 64- 
bit blocks, but the probability stays approximately the same for larger block 
lengths) . 

In fact, in our attack we will loosen the restrictions on the output difference, 
and consider for the last round only the Hamming weight of the difference, rather 
than specific values. 

5.2 Key Recovery 

The key recovery procedure works as follows. We choose pairs of plaintexts with 
a non-zero difference in one bit of Bq only. We select the texts such that the 
output difference of h in the first round is zero. This is made particularly easy 
by the absence of a key addition at the inputs of F. Whether the output of Fi 
in the second round is also zero, as required by the characteristic, depends on 
the plaintexts and the unknown value of the first round key only. 

When we find a pair with a small output difference, we assume that it follows 
the characteristic in the second round. This gives us information about the value 
of the first round key. By judicious choice of the plaintexts, we can determine the 
bits of the first round key a few at a time. More requirements could be imposed 
on the plaintext pairs, in order to make sure that all pairs pass the first two 
rounds with probability one. This would, however, complicate the key recovery 
phase. 



5.3 Filtering Issues 

A basic, non-optimal approach is to use a characteristic that determines the 
differences until the last round of the cipher. This ensures that all wrong pairs 
are filtered, but the probability of the characteristic is quite low. The fact that 
differences spread very slowly through the rounds of SPEED can be used to relax 
the conditions on the pairs. Instead of requiring that B^ of the output difference 
equals 80 (in base-16) and that the remaining Bi have difference zero we will 
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just accept any pair where the Hamming weight of the output difference is below 
some threshold H. This improves the probability of the characteristic, because 
pairs are allowed to “fan out” in the last rounds. The disadvantage is that it 
becomes possible that wrong pairs are accepted. In order to get correct results, a 
value of H has to be selected such that the expected number of wrong pairs after 
filtering is below the expected number of good pairs. A block length of 128 or 
256 allows for a larger H. Therefore, versions of SPEED with a block length of 
128 or 256 bits require a higher number of rounds than the 64-bit version to be 
secure against the differential attack. The signal-to-noise ratio of the attack can 
be improved by using more sophisticated filtering techniques that are described 
in 0. 



5.4 Experimental Results 

We implemented differential attacks on versions of SPEED with a reduced num- 
ber of rounds. The results are given in Table H For versions with more than 
28 rounds, the plaintext requirements become impractical. From the obtained 
results, we estimate that a differential attack on SPEED with R rounds requires 
at least plaintext pairs. Because of the effects described in Section^ the 

plaintext requirements will probably increase even more if i? > 44. This means 
that SPEED with 48 rounds and a block length of 64 bit is secure against our 
differential attack. For versions with a block length of 128 or 256 bit, more than 
64 rounds are needed to obtain adequate resistance. 



# rounds 


success rate 


# pairs 


16 


100% 




20 


100% 


218 


24 


100% 


225 


28 


80% 


231 



Table 6. Experimental results for reduced versions of SPEED. The numbers in 
the first three rows are obtained from 100 experiments each, the numbers in the 
last row are obtained from 10 experiments. 



6 Related Key Attack 

Related key attacks were first described in p. They are a class of attacks in which 
one examines the results of encrypting the same ciphertext under related keys. 
We perform such an analysis to produce collisions in the encryption function: 
two keys which encrypt the same plaintext to the same ciphertext. 

In ^3 the author suggests using SPEED in a variant of Davies-Meyer hashing 
in order to transform SPEED into a hash function. Specifically, a message is 
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padded to a multiple of 256 bits by appending unique padding and a length (the 
exact details of the padding are beyond the scope of this paper). The resulting 
message M, is split into 256-bit chunks Mq, Mi, . . . The hash is 

where Dq — 0 and Di — Di-i + (A-i)- Ek{X) denotes the encryption of 

X with key K. (Addition is defined slightly differently, but the exact definition 
does not affect our attack so we omit it.) 

The cipher may be 64, 128, or 256 bits wide and hence so may the correspond- 
ing hash (although 64 bits would easily fall to a conventional birthday attack). 
Furthermore, the author of suggests using 32 to 48 rounds for efficiency. We 
have successfully produced hash collisions for the 128-bit hashes with 32 rounds 
and also for 48 rounds. Using the reference implementation obtained from the 
author, we found the following collision for the 128-bit hash with 32 rounds (in 
base- 16): 



M = 21EA FE8E 1637 19F7 22D2 8CCB 3724 3437 
BOOF 7607 3C91 3710 2B69 C9C9 58FB 0823 
AEC2 CD05 FD80 14E6 BllE 43C0 5767 76F7 
FF07 17EC FCBA 224E 9627 A16A 8D6E 83A9 

M’ = 21EA FE8E 1637 19F7 22D2 8CCB 3724 3437 
BOOF 7607 3C91 3710 2B69 C9C9 58FB 0823 
AEC2 CD05 FDCO 14E6 BllE 4380 5767 76F7 
FF07 17EC 7CBA 224E 9627 216A 8D6E 83A9 



This leads to the following values when hashing (in base-16): 



Do = 0000 0000 0000 0000 0000 0000 0000 0000 
Di = 90DA 7F34 46FA A373 B048 11F7 F8D9 BB3D 
D 2 - Di = 9781 9517 B5CC A046 DOFl 3719 ED9B A0B6 



Do = 0000 0000 0000 0000 0000 0000 0000 0000 
Di = 90DA 7F34 46FA A373 B048 11F7 F8D9 BB3D 
D 2 - Di= 9781 9517 B5CC A046 DOFl 3719 ED9B A0B6 



We also found the following collision for the 128-bit hash with 48 rounds (in 
base- 16): 
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M = 3725 6571 48D5 CF52 DAEl 4065 7115 llAO 
E3C5 9428 7BFD 18CB EF79 82BB 1D7F 2F55 
36F2 CD58 9058 FE57 D696 EA4C BD75 F7C9 
1989 A048 39FB 9B76 9011 CACO 65F6 EBC7 

M’ = 3725 6571 48D5 CF52 DAEl 4065 7115 llAO 
E3C5 9428 7BFD 18CB EF79 82BB 1D7F 2F55 
38F2 CB58 9058 FC57 D896 EA4C BD75 F7C9 
1985 A04C 39FB 9B7A 900D CACO 65F6 EBC7 



This leads to the following values when hashing (in base- 16): 



Do = 0000 0000 0000 0000 0000 0000 0000 0000 
Di = DA2B A119 A4F8 AA70 59ED 6FE4 188B 7969 
D 2 - Di= CABl DA86 B6D3 1442 E05C A005 7B26 C432 



To produce collisions, we combine two different attacks: 

1 . A differential attack against the key schedule which produces two key sched- 
ules with a desired difference. 

2. A related key attack against the cipher using the two related key schedules. 

We feel that the attack is more illuminating when we address these two attacks 
in the opposite order. Therefore we will describe the related key attack first in 
order to give a motivation for the attack against the key schedule. 

6.1 Related-Key Attack Against the Cipher 

The fundamental observation that makes this attack possible is that a 1-bit in- 
put difference to any of the four F-functions will produce a zero output difference 
with probability 1/2 (actually this doesn’t quite hold for F 2 , but this doesn’t 
seem to strongly affect experimental results). In our attack, we attempt to in- 
troduce 1-bit differences into the encryption state through the key schedule. We 
do this in such a way so that after several rounds, the introduced differences 
negate each other and the resulting encryption state is the same for both keys. 
In TableOof Appendix 1, we have shown all 32 rounds of our related-key attack. 
In summary, we encrypt a message with two different keys where the subkeys 
for rounds 1, 4, 9, 12, 17, and 25 have specific differences so that the encryption 
state under the two keys will be identical in rounds 13-16 and rounds 26-32. 
Initially the two encryption states are the same, and after round 25, the two en- 
cryption states will be identical with total probability 2“^®. Since the remaining 
keywords of the key schedule are identical, the probability that the same plain- 
text will encrypt to the same ciphertext under the two different key schedules is 

2 - 19 ^ 
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Note, there are four variations on this attack in which we add 1 to 4 rounds 
prior to round 1 in which the two key schedules are identical. Hence the subkeys 
for rounds t + 1, t + 4, t + 9, t + 12, t + 17, and t + 25 with t G {0, 1, 2, 3, 4} are 
given the specified differences. If we define variant t to be the above differential 
with t additional initial rounds, then the desired key differences are: 



AK[i] = 



23 


if i G {t + l,t + 17} 


-23 


if i G {t + 4} 


_(2J)»7 


if z G © 9, t © 25} 


(2J)»7 


if i G {t+ 12} 


0 


otherwise 



( 1 ) 



The collisions we found at the beginning of this section were for variant 2 with 
j = 64. 

variant 2 



Subtle Difficulties with the Attack 

It is not hard to see that our attack makes the simplifying assumption that the 
inputs to each round are independent and uniformly distributed. Hence any pair 
of inputs that will lead to the desired difference is possible. However, in many 
cases this is an incorrect assumption and it can strongly affect our attack. 

To make the discussion easier, we will examine a smaller version of the cipher 
with 1-bit words. This is a fair thing to do since adjacent bits in a word affect each 
other within the F-function only through the data dependent rotate. As stated in 
Section^ the F-function is composed of a non-linear function Fj {i G {1, 2, 3, 4}) 
composed with a data-dependent rotate v. If the output difference for Fi is zero, 
then given two different inputs the output difference of the data-dependent rotate 
will also be zero. This means that a 1-bit difference in any word will not change 
affect any adjacent bits if the output difference of Fj is zero. 

Once we have made the reduction, we can regard the sequence of rounds 
as the successive states of a non-linear shift register. Specifically, we have a 
sequence of states Xi where Xq is the input to our encryption function, = 
{Xi\\{ki+i © Ffc(Ai)))o...6, where Fk can be Fi,F2,Fs or F4, depending on the 
particular round number. The output of the cipher is then Xr where r is the 
number of rounds. 

In our 32-round related-key attack, ki ^ fc' for a small number of i. If one 
examines the sequence of states produced by the same initial state Xq and two 
related keys ki and fc-, then for variant j we want that Xj+ie © = 0 and 

Aj-|_i7 © = 2° = 1. For z = 0, . . . ,6, ki+j+ir = so we can examine 

a simplified shift register whose feedback function is = {Yi\\{ki+i®Gk{Yi))), 
where Gfc = F3 if 0 < i < 6 — j, and Gfc = F4 if 6 — j < i < 6, To = 

Yq = Aj_|_27, and k is an arbitrary key. That is, we can consider these 7 rounds 
(round j + 17 to j + 23) in isolation since the subkeys are the same for both 
values of our key. 
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We want to examine the sequence of states Yi and F/ where Fq ® = 1- 

In order for our related-key attack to work, we must have that Yi 0 F/ = 2® 
for 0 < i < 6. Unfortunately it turns out that for certain j (e.g. j = 0,1), 
there are no input states Fq and FJ and key k for which the resulting sequences 
have this property. We determined this by performing a brute-force search with 
a computer. There are only 64 different k [k^ really has no influence) and 64 
different Fq to consider. Hence we need only examine 64 • 64 = 4096 different 
cases. 

Our analysis showed that variants 0 and 1 will not produce the desired col- 
lisions, but that variants 2, 3, and 4 will. In performing a computer search we 
found that we were not able to find collisions within the expected time for vari- 
ants 0 and 1, providing evidence for the correctness of our analysis. The collision 
we provided above was for variant 2, showing that there does exist a satisfying 
sequence of states. 



Extending the Attack to 48-rounds 

Unfortunately it is not as straightforward as one might hope to extend our 
attack to 48 rounds. By duplicating the attack of rounds 17-32 for rounds 33- 
48, in TableH we obtain a plausible looking related- key attack for the 48-round 
version of the cipher. In summary, we have an additional sixteen subkeys, two 
of which have specific differences (the subkeys for rounds 33 and 42). However, 
in attempting to perform a computer search for collisions, we found that we 
were unable to produce collisions after the first 32 rounds, let alone the entire 
48 rounds. This is what initially led us to the analysis performed in the previous 
section. 

It turns out that for the 48-round version of the cipher, the five different 
variants of the attack result in a shift register with feedback function: 






F^iX) ®ki if 0 < i < min (7 — j, 6) 
F^{X) ®ki if 7 — j < i < 6 



where j is the variant we wish to examine. For the resulting shift register, 
there are no initial states Fq and Fq and key k such that Yi 0 F/ = 2* for 
0 < i < 7. Consequently our related-key attack will fail to produce a zero output 
difference after round 32. 

A slightly modified version of our attack does work. We present the first 32 
rounds of the attack in TableH(in Appendix 1). The last 16 rounds are simply 
a copy of rounds 17-33. The associated differential attack on the key schedule is 
presented in Table^J(also in Appendix 1). In essence we overlap two differential 
attacks so that differences are introduced for 9 rounds instead of 8 rounds as 
before. The total probability that a key will satisfy the differential is 2“^®. The 
total probability that two related keys will produce a collision is 2“®^. 



6.2 Differential- Attack on the Key Schedule 

In order to carry out our related key attack, we must find two different keys which 
will produce key schedules with the differences specified in We performed 
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a straightforward differential attack on the key schedule to produce the desired 
pair. The specifications for using SPEED as a hash function require a 256-bit key. 
Since we are using a 128-bit blockwidth, this implies that the first 16 subkeys 
will be our key and the remaining subkeys will be derived from the key. 

The key scheduling algorithm is straightforward and we include a modified 
copy from here: 

Step 1. Let kb[0], kb[l], . . . , fc6[31] be an array of double-bytes where kb[0 ], . . . , 
fc6[15] are the original 16 double- byte values of the key. 

Step 2. This step constructs kb[16], . . . , fc5[31] from the user key data kb[0], . . . , 
kb[15]. It employs three double-byte constants Qz.o, Qi.i, and Qi. 2 - 

1. Let So = Qi,o, Si = Q/, 1 , and S 2 = Qi, 2 - 

2. For i from 16 to 31 do the following: 

(a) T=G(52,5 i,5o). 

(b) Rotate T to the right by 11 bits. 

(c) T = T + S 2 + kb[j] (mod 2^®), where j = i (mod 16). 

(d) kb[i\ = T. 

(e) 52 = 5i, 5i = So, So = T. 

where 

G{Xo, Xi,X 2 ) = {Xo © Xi) A (Xo © X 2 ) A {Xi © X 2 ). 

The two crucial observations about the key scheduling algorithm are: 

1. Adjacent bits in the input have minimal influence on each other. Hence we 
can work as if it took 1-bit inputs and produced a 1-bit output. 

2. When viewed as a function on 1-bit inputs, dipping one or two inputs will 
flip the output with probability exactly 1/2. 

7 Practical Considerations: Performance and Secnrity 

SPEED is defined as having both a variable block length and a variable number 
of rounds. Since almost any Feistel network is secure after enough rounds, what 
does it mean to say that SPEED is “broken” ? For a cryptographic engineer, this 
means that the still-secure variants of SPEED are significantly slower than other 
secure alternatives. 

Table H compares the throughput of SPEED with the throughput of other 
block ciphers. Benchmarks for the other algorithms were taken from We 

only include the SPEED block-length and round- number pairings that we believe 
are secure. 

^ In the author compares different variants of SPEED with IDEA. Since 
both benchmark IDEA twice as fast as we performed our own efficiency analysis 
on SPEED. We estimate that a fully optimized version of SPEED on a Pentium will 
take 20 clock cycles per round, for word sizes of 64 bits, 128 bits, and 256 bits. 
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Algorithm 


Block width 


# rounds 


Clocks/byte of output 


Blowfish 


64 


16 


19.8 


Square 


128 


8 


20.3 


RC5-32/16 


64 


16 


24.8 


CAST-128 


64 


16 


29.5 


DBS 


64 


16 


43 


SAFER (S)K-128 


64 


8 


52 


IDEA 


64 


8 


74 


Triple-DES 


64 


48 


116 


SPEED 


64 


64 


160 


SPEED 


64 


80 


200 


SPEED 


64 


96 


240 


SPEED 


128 


64 


80 


SPEED 


128 


80 


100 


SPEED 


128 


96 


120 


SPEED 


256 


64 


40 


SPEED 


256 


80 


50 


SPEED 


256 


96 


60 



Fig. 2. Comparison of Different Block Ciphers. 



8 Conclusions 

In this paper, we have discussed the SPEED proposed block cipher in terms of 
cryptanalytic attack. We have pointed out a few potential weaknesses, demon- 
strated an attack on the Davies-Meyer hashing mode of SPEED with 32 and 48 
rounds, and explored other attacks on the 48-round SPEED block cipher. 

It is interesting to note that SPEED, though built using very strong com- 
ponent functions, doesn’t appear to be terribly secure. The SPEED design ap- 
parently relied upon the high quality of the binary functions used, the fact that 
different functions were used at different points in the cipher, and the data- 
dependent rotations to provide resistance to cryptanalysis. Unfortunately, the 
most effective attacks aren’t made much less powerful by any of these defenses. 

It is also interesting to note the new difficulties that occur in attacking this 
kind of cipher. The source-heavy UFN construction of SPEED forced us to recon- 
sider our assumptions about carrying out differential attacks, and added some- 
what to the difficulty of thinking about these attacks. Surprisingly many of the 
obvious differential attacks did not work against the cipher, although it’s not 
obvious that a different choice of F-functions would present the same problems. 
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Appendix 1 

This section contains the details for the two related key and differential attacks 
we performed against SPEED. 

Note, the first column of Table H (and also Tabled shows the additive 
differences between the respective words of the two key schedules. For example, 
K'\Q] — K[0] = 64. Note that in the 128-bit version of the cipher, each word is 
16 bits wide. Hence rriij (the target block) will be rotated right 16/2 — 1 = 7 
bits and 64^^ = 2^®. With probability 2“^^, two different key schedules with 
the specified differences will have the exact same encryption state after round 
12 . 
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A The initial input to the cipher is the same for both keys. 

B The probability that an additive difference of 64 produces an XOR difference of 64 
is 1/2. 1/4 comes from multiplying by the probability that a 1-bit input difference 
will produce a zero output difference for the F-function. 

C The probability that an additive difference of -64 produces an XOR difference of 
64 is 1/2. The XOR difference of 64 and -64 occur in the same bit-position so the 
probability that the output difference of the F-function is zero is 1/2. 

D (64^’’) -f 2^® = 0 (mod 2)^®. Hence Amgfi — 0 with probability 1 (assuming 
Af = 0, which occurs with probability 1/2). 



Table 7. Related Key Attack Against 32-round Cipher. 
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a. Denotes the value of T after step 2a. Similarly, Th and Tc denote the value of T 
after steps 2b and 2c respectively. 

Table 9. Differential Attack Against 32-round Key Schedule. 
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Table 10. First 16 Rounds of Differential Attack Against 48-round Key Sched- 
ule. 
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Abstract. This paper surveys recent work on the design and analysis of 
key agreement protocols that are based on the intractability of the Difhe- 
Hellman problem. The focus is on protocols that have been standardized, 
or are in the process of being standardized, by organizations such as 
ANSI, IEEE, ISO/IEC, and NIST. The practical and provable security 
aspects of these protocols are discussed. 



1 Introduction 

Authenticated key establishment protocols are designed to provide two or more 
specified entities communicating over an open network with a shared secret key 
which may subsequently be used to achieve some cryptographic goal such as 
confidentiality or data integrity. Secure authenticated key establishment proto- 
cols are important as effective replacements for traditional key establishment 
achieved using expensive and inefficient couriers. 

Key establishment protocols come in various flavors. In key transport proto- 
cols, a key is created by one entity and securely transmitted to the second entity, 
while in key agreement protocols both entities contribute information which is 
used to derive the shared secret key. In symmetric protocols the two entities 
a priori possess common secret information, while in asymmetric protocols the 
two entities share only public information that has been authenticated. This pa- 
per is concerned with two-party authenticated key agreement protocols in the 
asymmetric setting. 

The design of asymmetric authenticated key agreement protocols has a check- 
ered history. Over the years, numerous protocols have been proposed to meet a 
variety of desirable security and performance requirements. Many of these proto- 
cols were subsequently found to be flawed, and then either were modified to resist 
the new attacks, or were totally abandoned. After a series of attacks and mod- 
ifications, only those surviving protocols which had received substantial public 
scrutiny and were believed to resist all known attacks were deemed secure for 
practical usage. Protocols that evolve from this ‘attack-response’ methodology 
are said to provide heuristic security. 
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There are two primary drawbacks of protocols which provide heuristic se- 
curity. First, their security attributes are typically unclear or not completely 
specified. Second, they offer no assurances that new attacks will not be discov- 
ered in the future. These drawbacks make a notion of provable security desirable. 
This would entail specification of a formal model of computing which accurately 
captures the characteristics of the participating entities and a real-life powerful 
adversary, a formal definition of the security goals within this model, a clear 
statement of any assumptions made, and, finally, a rigorous proof that the pro- 
tocol meets these goals within the model. 

While provable security may appear to be the highest possible level of secu- 
rity for a key agreement protocol, the approach does have some limitations. Most 
significantly, it is difficult to judge whether or not real-life threats and security 
goals are adequately reflected in a given model. That is, the provable security 
of a protocol is meaningful only if one finds the model, definitions, and underly- 
ing assumptions to be appropriate for one’s purposes. Nevertheless, significant 
progress has been made in recent years, and authenticated key agreement pro- 
tocols which are both provably secure and efficient have been devised and are 
being used in practice. 

This paper focuses on asymmetric authenticated key agreement protocols 
whose security is based on the intractability of the Diffie-Hellman problem. We 
discuss the practical and provable security aspects of some protocols which are 
being standardized by accredited standards organizations such as ANSI (Ameri- 
can National Standards Institute), IEEE (Institute of Electrical and Electronics 
Engineers), ISO/IEC (International Standards Organization/International Elec- 
trotechnical Commission), and the U.S. government’s NIST (National Institute 
of Standards and Technology). Such cryptographic standards have significant 
practical impact because they facilitate the widespread use of sound techniques, 
and promote interoperability between different implementations. 

The remainder of this paper is organized as follows. iHsummarizes the desir- 
able security and performance attributes of a key agreement protocol. In Q we 
review the basic ephemeral (short-term) and static (long-term) Diffie-Hellman 
key agreement protocols and point out their limitations, ^presents the KEA, 
Unified Model, and MQV authenticated key agreement protocols, while in Q 
we discuss protocols for authenticated key agreement with key confirmation. 
The protocols are compared in Q Recent progress in defining and proving the 
security of key agreement protocols is reviewed in ^concludes with some 
directions for future research. 



2 Goals of key agreement 

This section discusses in more detail the goals of asymmetric authenticated key 
establishment protocols. The complexity and variety of these goals explains in 
part the difficulties involved in designing secure protocols. 

The fundamental goal of any authenticated key establishment protocol is to 
distribute keying data. Ideally, the established key should have precisely the same 
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attributes as a key established face-to-face — for example, it should be shared by 
the (two) specified entities, it should be distributed uniformly at random from 
the key space, and no unauthorized (and computationally bounded) entity should 
learn anything about the key. A protocol achieving this idealistic goal could then 
be used as a drop-in replacement for face-to-face key establishment without the 
need to review system security in much the same way as pseudorandom bit 
generators can replace random bit generators. 

Unfortunately, such an abstract goal is not easily attained and it is not an 
easy task to identify and enunciate the precise security requirements of authen- 
ticated key establishment. Nonetheless over the years several concrete security 
and performance attributes have been identified as desirable. These are infor- 
mally described in the remainder of this section. Recent more formal attempts 
at capturing concrete security definitions are discussed in Q 

The first step is to identify what types of attacks it is vital for a protocol 
to withstand. Since protocols are used over open networks like the Internet, 
a secure protocol should be able to withstand both passive attacks (where an 
adversary attempts to prevent a protocol from achieving its goals by merely 
observing honest entities carrying out the protocol) and active attacks (where 
an adversary additionally subverts the communications by injecting, deleting, 
altering or replaying messages). 

The second step is to identify what concrete security goals it is vital for a pro- 
tocol to provide. The fundamental security goals described below are considered 
to be vital in any application. The other security and performance attributes are 
important in some environments, but less important in others. 

Fundamental security goals. Let A and B be two honest entities, i.e., legit- 
imate entities who execute the steps of a protocol correctly. 

1 . implicit key authentication. A key agreement protocol is said to provide im- 
plicit key authentication (of B to A) if entity A is assured that no other 
entity aside from a specifically identified second entity B can possibly learn 
the value of a particular secret key. Note that the property of implicit key 
authentication does not necessarily mean that A is assured of B actually 
possessing the key. 

2. explicit key authentication. A key agreement protocol is said to provide ex- 
plicit key confirmation (of B to A) if entity A is assured that the second 
entity B has actually computed the agreed key. The protocol provides im- 
plicit key confirmation if A is assured that B can compute the agreed key. 
While explicit key confirmation appears to provide stronger assurances to 
A than implicit key confirmation (in particular, the former implies the lat- 
ter), it appears that, for all practical purposes, the assurances are in fact the 
same. That is, the assurance that A requires in practice is merely that B can 
compute the key rather than that B has actually computed the key. Indeed 
in practice, even if a protocol does provide explicit key confirmation, it can- 
not guarantee to A that B will not lose the key between key establishment 
and key use. Thus it would indeed seem that implicit key confirmation and 
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explicit key confirmation are in practice very similar, and the remainder of 
this paper will not distinguish between the two. 

Key confirmation by itself is not a useful service — it is only desirable when 
accompanied with implicit key authentication. A key agreement protocol is 
said to provide explicit key authentication (of B to A) if both implicit key 
authentication and key confirmation (of B to A) are provided. 

A key agreement protocol which provides implicit key authentication to both 
participating entities is called an authenticated key agreement (AK) protocol, 
while one providing explicit key authentication to both participating entities is 
called an authenticated key agreement with key confirmation (AKC) protocol. 

Key agreement protocols in which the services of implicit key authentication 
or explicit key authentication are provided to only one (unilateral) rather than 
both (mutual) participating entities are also useful in practice, for example in 
encryption applications where only authentication of the intended recipient is 
required. Such unilateral key agreement protocols (e.g., ElGamal key agreement 
^3 Protocol 12.52]) are not considered in this paper. 

Other desirable security attributes. A number of other desirable security 
attributes have also been identified. Typically the importance of supplying these 
attributes will depend on the application. In the following, A and B are two 
honest entities. 

1. known-key security. Each run of a key agreement protocol between A and 
B should produce a unique secret key; such keys are called session keys. 
Session keys are desirable in order to limit the amount of data available for 
cryptanalytic attack (e.g., ciphertext generated using a fixed session key in 
an encryption application), and to limit exposure in the event of (session) 
key compromise. A protocol should still achieve its goal in the face of an 
adversary who has learned some other session keys. 

2. forward secrecy. If long-term private keys of one or more entities are compro- 
mised, the secrecy of previous session keys established by honest entities is 
not affected. A distinction is sometimes made between the scenario in which 
a single entity’s private key entity is compromised (half forward secrecy) 
and the scenario in which the private keys of both participating entities are 
compromised (full forward secrecy). 

3. key -compromise impersonation. Suppose A’s long-term private key is dis- 
closed. Clearly an adversary that knows this value can now impersonate A, 
since it is precisely this value that identifies A. However, it may be desir- 
able in some circumstances that this loss does not enable the adversary to 
impersonate other entities to A. 

4. unknown key-share. Entity B cannot be coerced into sharing a key with 
entity A without H’s knowledge, i.e., when B believes the key is shared with 
some entity C ^ A, and A (correctly) believes the key is shared with B. 

A hypothetical scenario where an unknown key-share attack can have dam- 
aging consequences is the following; this scenario was first described by Difhe, 
van Oorschot and Wiener Suppose that H is a bank branch and A is an 
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account holder. Certificates are issued by the bank headquarters and within 
each certificate is the account information of the holder. Suppose that the 
protocol for electronic deposit of funds is to exchange a key with a bank 
branch via a mutually authenticated key agreement. Once B has authenti- 
cated the transmitting entity, encrypted funds are deposited to the account 
number in the certificate. Suppose that no further authentication is done in 
the encrypted deposit message (which might be the case to save bandwidth) . 
If the attack mentioned above is successfully launched then the deposit will 
be made to C’s account instead of A’s account. 

Desirable performance attributes. These include: 

1. Minimal number of passes (the number of messages exchanged). 

2. Low communication overhead (total number of bits transmitted). 

3. IjOw computation overhead (total number of arithmetical operations required) . 

4. Possibility of precomputation (to minimize on-line computational overhead). 

Other desirable attributes. These include: 

1. Anonymity of the entities participating in a run of the protocol. 

2. Role symmetry (the messages transmitted have the same structure). 

3. Non-interactiveness (the messages transmitted between the two entities are 
independent of each other). 

4. Non-reliance on encryption in order to meet export restrictions. 

5. Non-reliance on hash functions since these are notoriously hard to design. 

6. Non-reliance on timestamping since it is difficult to implement securely in 
practice. 

3 DifRe— Heilman key agreement 

This section describes the basis of Diffie-Hellman based key agreement protocols 
and motivates the modern protocols we describe in (Jand flby illustrating some 
of the deficiencies of early protocols. 

The mathematical tool commonly used for devising key agreement protocols 
is the Diffie-Hellman problem: given a cyclic group G of prime order n, a gen- 
erator g of G, and elements g^, g"^ G G (where x, y Gr [1, n — 1]), find 5^^. (We 
use X Gr S to denote that x is chosen uniformly at random from the set S.) 
This problem is closely related to the widely-studied discrete logarithm problem 
(given G, n, g, and g^ where x Gr [0, n — 1], find a:), and there is strong evidence 
that the two problems are computationally equivalent (e.g., see and |3). 

For concreteness, this paper deals with the case where G is a prime order 
subgroup of Z*, the multiplicative group of the integers modulo a prime p. How- 
ever, the discussion applies equally well to any group of prime order in which the 
discrete logarithm problem is computationally intractable, for example prime or- 
der subgroups of the group of points on an elliptic curve over a finite field. The 
following notation is used throughout the paper. 
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A, B Honest entities. 

p 1024-bit prime. 

q 160-bit prime divisor of p — 1 . 

g An element of order 5 in Z*. 

a, b Static private keys of A and B] a,b£R [l,q- 1 ]. 

Ya, Yb Static public keys of A and H; Ya — mod p, Yb — mod p. 

X, y Ephemeral private keys of A and B; x,y Gii[l,q — 1]. 

Ra, Rb Ephemeral public keys of A and B- Ra = g^ mod p, Rb = niod p. 
H A cryptographic hash function (e.g., SHA-1 B3)’ 

MAC A message authentication code algorithm (e.g., 

The operator mod p will henceforth be omitted. 

The domain parameters (p, q, g) are common to all entities. For the remainder 
of this paper, we will assume that static public keys are exchanged via certifi- 
cates. CertA denotes A’s public-key certificate, containing a string of information 
that uniquely identifies A (such as A’s name and address), her static public key 
Ya, and a certifying authority CA’s signature over this information. Other in- 
formation may be included in the data portion of the certificate, including the 
domain parameters if these are not known from context. Any other entity B can 
use his authentic copy of the CA’s public key to verify A’s certificate, thereby 
obtaining an authentic copy of A’s static public key. 

We assume that the CA has verified that A possess the private key a corre- 
sponding to her static public key Ya- This is done in order to prevent potential 
unknown key-share attacks whereby an adversary E registers A’s public key Ya 
as its own and subsequently deceives B into believing that A’s messages orig- 
inated from E (see ^3 for more details). Checking knowledge of private keys 
is in general a sensible precaution and is often vital for theoretical analysis. We 
also assume that the CA has verified the validity of A’s static public key Ya, 

i.e., the CA has verified that 1 < Ya < p and that (Ya)^ = 1 (mod p); this 

process is called public key validation Rationale for performing public key 
validation is provided in 

The first asymmetric key agreement protocol was proposed by Difhe and 
Heilman in their seminal 1976 paper ^ 3 . We present two versions of the basic 
protocol, one where the entities exchange ephemeral (short-term) public keys, 
and the other where the entities exchange static (long-term) public keys. 

Protocol 1 (Ephemeral Diffie Heilman) 

1. A selects a: €/? [1 , 9 — 1] and sends Ra = g^ to B. 

2. B selects y Gr [1, g — 1] and sends Rb = g^ to A. 

3. A computes K = {RbY = 5^^- 

4. B computes K = {RaY = 5^^- 

While the ephemeral DifHe-Hellman protocol provides implicit key authen- 
tication in the presence of passive adversaries, it does not on its own provide 
any useful services in the presence of active adversaries since neither entity is 
provided with any assurances regarding the identity of the entity it is communi- 
cating with. (See also Tablejin ^) This drawback can be overcome by using 
public keys that have been certified by a trusted CA. 
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A, X 

K = ^ 



B,y 
K = 



Fig. 1. Protocol 1 (Ephemeral Diffie-Hellman). 



Protocol 2 (Static Diffie Heilman) 

1. A sends Cert^ to B. 

2. B sends Certs to A. 

3. A computes K = (Yb)“ = 5“^- 

4. B computes K = (Ya)^ = 5“^- 



A, a 

K = g‘^’’ 



g°‘ G CertA 
(/*' £ Certs 



B,b 

K = g‘^’’ 



Fig. 2. Protocol 2 (Static Diffie-Hellman). 



Since each entity is assured that it possesses an authentic copy of the other 
entity’s public key, the static Diffie-Hellman protocol offers implicit key authen- 
tication. A major drawback, however, is that A and B compute the same shared 
secret K — for each run of the protocol. 

The drawbacks of the ephemeral and static Diffie-Hellman protocols can be 
alleviated by using both static and ephemeral keying material in the formation 
of shared secrets. An example of an early protocol designed in this manner is 
the MTI/CO protocol 

Protocol 3 (MTI/CO) 

1. A selects x [1, g — 1] and sends Ta = (Yb)“ to B. 

2. B selects y Gr [1, g — 1] and sends Tr = (Ya)^ to A. 

3. A computes K = {TrY 

4. B computes K = {TaY~"^ = 9^^ ■ 



A, a, X 
K = g^y 



B,b,y 
K = g^^ 



Fig. 3. Protocol 3 (MTI/CO). 
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This protocol appears secure at first glance. Unfortunately, it turns out that 
this attempt to combine static and ephemeral Difhe-Hellman protocols has in- 
troduced some subtle problems. As an example, consider the following instance 
of the small subgroup attack on the MTI/CO protocol. An adversary E re- 
places Ta and Tb with the identity element 1. Both A and B now form K = 1, 
which is also known to E. This attack demonstrates that the MTI/CO protocol 
(as described above) does not offer implicit key authentication. 

The 3 protocols described in this section demonstrate some of the subtleties 
involved in designing secure authenticated key agreement protocols. Other kinds 
of attacks that have been identified besides small subgroup attacks include: 



1 . 



2 . 

3. 



intruder-in-the-middle attack In this classic attack on ephemeral Difhe- 
Hellman, the adversary replaces A’s and B's ephemeral keys and g^ with 
keys g^ and g^ of its choice. E can then compute the session keys formed 
by A and B {g^^ and respectively), and use these to translate messages 
exchanged between A and B that are encrypted under the session keys. 
reflection attack ^ 3 - challenges are replayed back to A as messages 
purportedly from B. 



interleaving attack 



The adversary reuses messages transmitted dur- 



ing a run of the protocol in other runs of the protocol. 



Such attacks are typically very subtle and require little computational over- 
head. They highlight the necessity of some kind of formal analysis to avoid the 
use of flawed protocols. 



4 AK protocols 

This section discusses some AK protocols currently proposed in standards. We 
present the two-pass KEA, Unihed Model, and MQV protocols, and their one- 
pass variants. 

Before we present the AK protocols it is worth reminding the reader that, as 
discussed in Q it is highly desirable for key establishment protocols to provide 
explicit key authentication. Thus, when AK protocols are used in practice, key 
confirmation should usually be added to the protocols. Nonetheless it is worth 
presenting the raw AK protocols since key confirmation can be achieved in a 
variety of ways and it is sometimes desirable to separate key confirmation from 
implicit key authentication and move the burden of key confirmation from the 
key establishment mechanism to the application. For example, if the key is to be 
subsequently used to achieve confidentiality, then encryption with the key can 
begin on some (carefully chosen) known data. Other systems may provide key 
confirmation during a ‘real-time’ telephone conversation. We present a generic 
method for securely incorporating key confirmation into AK protocols in 

4.1 KEA 

The Key Exchange Algorithm (KEA) was designed by the National Security 
Agency (NSA) and declassified in May 1998 It is the key agreement protocol 
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in the FORTEZZA suite of cryptographic algorithms designed by NSA in 1994. 

KEA is very similar to the Goss and MTI/AO ^3 protocols. 

Protocol 4 (KEA) 

1. A and B obtain authentic copies of each other’s public keys Ya and Yb- 

2. A selects x Gr [1, g — 1] and sends Ra — to B. 

3. B selects y Gr [1, g — 1] and sends Rb = to A. 

4. A verifies that 1 < Rb < P and {RbY = 1 (modp). If any check fails, 

then A terminates the protocol run with failure. Otherwise, A computes the 
shared secret K = {YbY + {RbY mod p. If AT = 0, then A terminates the 
protocol run with failure. 

5. B verifies that 1 < Ra < p and {RaY — 1 (modp). If any check fails, 
then B terminates the protocol run with failure. Otherwise, B computes the 
shared secret K — {YaY + {RaY modp. If AT = 0, then B terminates the 
protocol run with failure. 

6. Both A and B compute the 80-bit session key k = kdf(AT), where kdf is a 
key derivation function derived from the symmetric-key encryption scheme 
SKIPJACK (see [9 for further details). 



A, a, X 



B,b,y 

K = + g^^ 



Fig. 4. Protocol 4 (KEA) . 



To illustrate the need for the features of KEA, we demonstrate how the 
protocol is weakened when certain modifications are made. This serves to further 
illustrate that designing secure key agreement protocols is a delicate and difficult 
task, and that subtle changes to a protocol can render it insecure. 

Validation of public keys - verifying that they lie in the subgroup 
OF ORDER q . Suppose that A does not verify that {RbY = 1 (mod p). Then, 
as observed by Lim and Lee it may be possible for a malicious B to learn 
information about A’s static private key a as follows using a variant of the small 
subgroup attack. Suppose that p — I has a prime factor I of small bitlength 
(e.g., 40 bits). Let /3 £ Z* be of order 1. If B sends Rb — P to A, then A 
computes K = g^^ J- /3“ modp and k = kdf (AT). Suppose now that A sends 
B an encrypted message c = Ek{m), where A is a symmetric- key encryption 
scheme and the plaintext m has some recognizable structure. For each d, 0 < 
d < I — 1, B computes K' = g^^ + modp, k' = kdf(A''), and m' = A^^(c). 
If m' possesses the requisite structure, then B concludes that d = a mod /, thus 
learning some partial information about a. This can be repeated for different 
small prime factors / of p — 1 . 
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Validation of public keys ~ verifying that they lie in the interval 
[2,p— 1]. Suppose that A does not verify that 1 < Yb < p and 1 < Rb < P- 
Then an adversary E can launch the following unknown key-share attack. E 
gets Ye — ^ certified as its static public key. E then forwards A’s ephemeral 
public key Ra to B alleging it came from E. After B replies to E with Rb, E 
sends R^ = 1 to A alleging it came from B. A computes Kab = 9^^ + 1 and B 
computes Kbe = 9^^ + 1 ■ Thus B is coerced into sharing a key with A without 
B’s knowledge. 

Use OF A KEY DERIVATION FUNCTION. The key derivation function kdf is used 
to derive a session key from the shared secret key K. One reason for doing this is 
to mix together strong bits and potential weak bits of K — weak bits are certain 
bits of information about K that can be correctly predicted with non-negligible 
advantage. 

Another reason is to destroy the algebraic relationships between the shared 
secret K and the static and ephemeral public keys. This can help prevent against 
some kinds of known-key attacks, such as Burmester’s triangle attack which 
we describe next. An adversary E, whose static key pair is (c,9‘^), observes a 
run of protocol between A and B in which ephemeral public keys 9^ and are 
exchanged; the resulting shared secret is Kab — 9°“^ + 9^^ ■ E then initiates a run 
of the protocol with A, replaying as its ephemeral public key; the resulting 
secret which only A can compute is Kae = 9°“^ + 9‘^^ , where 9^ is A’s ephemeral 
public key. Similarly, E initiates a run of the protocol with B, replaying 
as its ephemeral public key; the resulting secret which only B can compute is 
Kbe = 5 ^^ + where g^ is B’s ephemeral public key. If E can somehow 
learn Kae and Kbe (this is the known- key portion of the attack), then E can 
compute Kab = Kae + Kbe ~ 9‘"^ ~ 9^"^ ■ 

The check that K ^ 0. This check is actually unnecessary as the following 
argument shows. Since ( 5 ^)'^ = {9^Y — 1 (mod p), we have that — 

(^9<^y)<i = 1 (mod p). Now, K = 0 if and only if 5 ^^ = — 5 “^ (mod p). But this 
is impossible since otherwise — i~9°'^Y — (“l)"^ — “1 (mod p). 

Security notes. KEA does not provide (full) forward secrecy since an adver- 
sary who learns a and b can compute all session keys established by A and B. 
See also Table O™ fl 



4.2 The Unified Model 

The Unified Model, proposed by Ankney, Johnson and Matyas is an AK 
protocol that is in the draft standards ANSI X9.42 ANSI X9.63 Q, and 

IEEE PI363 One of its advantages is that it is conceptually simple and 

consequently easier to analyze (see ^ 3 . 

Protocol 5 (Unified Model) 

1. A selects a: [1 , 9 — 1] and sends Ra = 9^ and Cert^ to B. 

2. B selects y Gr [1, g — 1] and sends Rb = 9^ and Certs to A. 
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3. A verifies that 1 < Rb < P and {RbY = 1 (modp). If any check fails, 
then A terminates the protocol run with failure. Otherwise, A computes the 
session key k = H{(Yb)°‘\\{Rb)^)- 

4. B verifies that 1 < Ra < p and {RaY = 1 (modp). If any check fails, 
then B terminates the protocol run with failure. Otherwise, B computes the 
session key k — H {(YaYIKRaY) ■ 



A, a, X 

k = H{g‘^>’\\g-y) 



B,b,y 

k = H{g‘^>’\\g-y) 



Fig. 5. Protocol 5 (Unified model). 



Security notes. The Unified Model does not provide the service of key com- 
promise impersonation, since an adversary who learns a can impersonate any 
other entity B to A. See also Table Jin 

4.3 MQV 

The so-called MQV protocol ^9 is an AK protocol that is in the draft standards 
ANSI X9.42 B> ansi X9.63l^nd IEEE P1363 Q. The following not^ion 
is used. If A e [I,p - 1], then X = {X mod 28°) + 2®°; more g_enerally, X = 
{X mod 2 i^Al ) _|_ 2 U /21 ^ where / is the bitlength of q. Note that {X mod q) Y 0- 

Protocol 6 (MQV) 

1. A selects a; [1 , 9 — 1] and sends Ra — and Cert^ to B. 

2. B selects y Gr [1, g — 1] and sends Rb = and Certs to A. 

3. A verifies that 1 < Rb < p and {RbY — ^ (modp). If any check fails, 

then A terminates the protocol run with failure. Otherwise, A computes 
sa = (x + qRa) mod q and the shared secret K = (i?s(Ys)^®)®'‘. If A = 1 , 
then A terminates the protocol run with failure. 

4. B verifies that 1 < Ra < p and {RaY — 1 (modp). If any check fails, 
then B terminates the protocol run with failure. Otherwise, B computes 
sb = {y + bRB) mod q and the shared secret K = {Ra{Ya)^^Y^ ■ If A = 1, 
then B terminates the protocol run with failure. 

5. The session key is fc = H{K). 

Security notes. The expression for Ra uses only half the bits of Ra- This was 
done in order to increase the efficiency of computing K because the modular 
exponentiation {Ya)^'^ can be done in half the time of a full exponentiation. 
The modification does not appear to affect the security of the protocol. The 
definition of Ra implies that Ra Y 0 ; this ensures that the contribution of the 
static private key a is not being cancelled in the formation of sa- 
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A, a, X 

sa = {x + a^) mod q 
K = 



9l 



B,b,y ^ 

SB = {y + bgy ) mod q 
K = 



Fig. 6. Protocol 6 (MQV). 



The check K = 1 ensures that K has order q. 

Kaliski [3 has recently observed that Protocol 6 does not possess the un- 
known key-share attribute. This is demonstrated by the following on-line attack. 
An adversary E intercepts A’s ephemeral public key Ra intended for B, and 
computes Re = RAiYA)^^ g~^ , e = {Re)~^ mod q, and Ye = E then gets 
Ye certified as her static public key (note that E knows the corresponding pri- 
vate key e), and transmits Re to B. B responds by sending Rb to E, which 
E forwards to A. Both A and B compute the same session key k, however B 
mistakenly believes that he shares k with E. We emphasize that lack of the un- 
known key-share attribute does not contradict the fundamental goal of mutual 
implicit key authentication — by definition the provision of implicit key authen- 
tication is only considered in the case where B engages in the protocol with an 
honest entity (which E isn’t). If an application using Protocol 6 is concerned 
with the lack of the unknown key-share attribute under such on-line attacks, 
then appropriate key confirmation should be added, for example as specified in 
Protocol 8 in 



4.4 One-pass variants 

The purpose of a one-pass AK protocol is for entities A and B to agree upon 
a session key by only having to transmit one message from A to B — this 
assumes that A a priori has an authentic copy of B’s static public key. One-pass 
protocols can be useful in applications where only one entity is on-line, such as 
secure email. Their main security drawbacks are that they do not offer known- 
key security (since an adversary can replay A’s ephemeral public key to B) 
and forward secrecy (since entity B does not contribute a random per-message 
component). 

The 3 two-pass AK protocols (KEA, Unified Model, MQV) presented in 
this section can be converted to one-pass AK protocols by simply setting B’s 
ephemeral public key equal to his static public key. We illustrate this next for 
the one-pass variant of the MQV protocol. A summary of the security services 
of the 3 one-pass variants is provided in Table^in 

Protocol 7 (One-pass MQV) 

1. A selects a; €/? [1 , 9 — 1] and sends Ra = and Cert^ to B. 

2 . Acomputess^ = (x+uRa) mod 9 and the shared secret K = (Yb{Yb)^ ■ 
It K = 1, then A terminates the protocol run with failure. 



Authenticated Diffie-Hellman Key Agreement Protocols 351 



3. B verifies that 1 < Ra < p and {RaY = 1 (modp). If any check fails, 
then B terminates the protocol run with failure. Otherwise, B computes 
sb — {b + bYs) mod q and the shared secret K = {Ra{Ya)^^Y’^ ■ If AT = 1, 
then B terminates the protocol run with failure. 

4. The session key is fc = H{K). 



A,a,x 

sa = {x + a^) mod q 
K = 



B,b _ 

ss = (6 + bg’’) mod q 

K = 



Fig. 7. Protocol 7 (One-pass MQV). 



5 AKC protocols 

This section discusses AKC protocols and describes a method to derive AKC 
protocols from AK protocols. 

The following three-pass AKC protocol is derived from the Unified Model 
AK protocol (Protocol 5) by adding the MACs of the flow number, identities, and 
the ephemeral public keys. Here, Hi and H 2 are ‘independent’ hash functions. 
In practice, one may choose Hi{m) = and H 2 {m) = where 

H is a cryptographic hash function. 

The MACs are computed under the shared key fc', which is different from 
the session key fc; Protocol 8 thus offers implicit key confirmation. If explicit 
key confirmation were to be provided by using the session key fc as the MAC 
key, then a passive adversary would learn some information about fc — the 
MAC of a known message under fc. The adversary can use this to distinguish 
fc from a key selected uniformly at random from the key space. This variant 
therefore sacrifices the desired goal that a protocol establish a computationally 
indistinguishable key. The maxim that a key establishment protocol can be used 
as a drop-in replacement for face-to-face key establishment therefore no longer 
applies and in theory security must be analyzed on a case- by-case basis. We 
therefore prefer Protocol 8 . 

Protocol 8 (Unified Model with key confirmation) 

1. A selects a; [1 , 9 — 1] and sends Ra = and Cert^ to B. 

2. (a) B verifies that 1 < Ra < P and {RaY — 1 (mod p). If any check fails, 

then B terminates the protocol run with failure. 

(b) B selects y G_r [ 1 ,( 7 — 11, and computes Rb = 0 ^, k' = Hi{{YaY\\{RaY)i 

k=HMYn(Rln^i^J MAC AX 

(c) B sends Rb, Certs, and ms to A. 

3. (a) A verifies that 1 < Rb < P and {RbY = 1 (mod p). If any check fails, 

then A terminates the protocol run with failure. 
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(b) A computes k' = Hi{{YbT\\{RbY) and m'^ = MACfc'(2, B, A, Rb, Ra), 
and verifies = mB- 

(c) A computes = MACfc/(3, A, i?s) Sind k = H 2 {{Yb)°‘\\{RbY), 
and sends Ra and niA to B. 

4. B computes = MACfc/(3, A, B, Ra, Rb) and verifies that = ttia- 

5. The session key is k. 



A, a, X 
k' = H^{g<^^\\g-y) 



► 

g ^ ,MACfc>(2,j?,A,ff^ff- ) 

MACy{3, A, B,gYgY 



B,b,y 



Fig. 8. Protocol 8 (Unified model with key confirmation). 



In a similar manner, one can derive three-pass AKC protocols from the KEA 
{KEA with key confirmation) and MQV {MQV with key confirmation) AK pro- 
tocols. The AKC variants of the Unified Model and MQV protocols are being 
considered for inclusion in ANSI X9.63 Q. 

A summary of the security services provided by the 3 AKC variants is given 
in Table Oia O This table illustrates why AKC protocols may be preferred 
over AK protocols in practice. First, the incorporation of key confirmation may 
provide additional security attributes which are not present in the AK proto- 
col. For example, addition of key confirmation in the manner described above 
makes the MQV protocol resistant to unknown key-share attacks. Second, the 
security properties of AKC protocols appear to be better understood; see also 
the discussion in ^3 Note that since the MACs can be computed efficiently, 
this method of adding key confirmation to an AK protocol does not place a 
significant computational burden on the key establishment mechanism. 

6 Comparison 

This section compares the security and efficiency of the protocols presented in 
Qand 3 

Security services. TableBcontains a summary of the services that are believed 
to be provided by the AK and AKC protocols discussed in Q Although 

only implicit and explicit key authentication are considered vital properties of 
key establishment, any new results related to other information in this table 
would be interesting. 

The services are discussed in the context of an entity A who has successfully 
executed the key agreement protocol over an open network wishing to establish 
keying data with entity B. In the table: 
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Scheme 


IKA 


EKA 


K-KS 


FS 


K-CI 


UK-S 


Ephemeral Diffie-Hellman 


X 


X 




n/a 


n/a 


X 


Ephemeral Diffie-Hellman (against pas- 
sive attack) 


V X 


X 


V X 


n/a 


n/a 


V V 


Static Diffie-Hellman 


a/ a/ 


X 


X 


X 


X 


V V 


One-pass KEA 


V X 


X 


X 


X 


VI 


V V 


KEA 


V X 


X 


V x 


X 


V V 


V V 


KEA with Key Confirmation 


V a/ 


V a/ 


V a/ 


X 


V V 


V V 


One-pass Unified Model 


V a/ 


X 


X 


X 


VI 


V V 


Unified Model 


a/ a/ 


X 




w 


X 


V V 


Unified Model with Key Confirmation 


V a/ 


V a/ 


a/ V 


a/ a/ 


X 


V V 


One-pass MQV 


X a/ 


X 


X 


X 


VI 


X 


MQV 


a/ a/ 


X 


a/ a/ 


w 


V V 


X 


MQV with Key Confirmation 


V a/ 


V a/ 


V a/ 


a/ a/ 


V V 


V V 



“ Here the technicality hinges on the definition of what contributes ‘another session 
key’. The service of known- key security is certainly provided if the protocol is ex- 
tended so that explicit authentication of all session keys is supplied. 

^ Again the technicality concerns key confirmation. Both protocols provide forward 
secrecy if explicit authentication is supplied for all session keys. If not supplied, then 
the service of forward secrecy cannot be guaranteed. 

Table 1. Security services offered by authenticated key agreement protocols. 



— y/y/ indicates that the assurance is provided to A no matter whether A 
initiated the protocol or not. 

— indicates that the assurance is provided modulo a theoretical technicality. 

— -y/I indicates that the assurance is provided to A only if A is the protocol’s 
initiator. 

— X indicates that the assurance is not provided to A by the protocol. 

The names of the services have been abbreviated to save space: IKA denotes 
implicit key authentication, EKA explicit key authentication, K-KS known- key 
security, FS forward secrecy, K-CI key-compromise impersonation, and UK-S 
unknown key-share. 

The provision of these assurances is considered in the case that both A and 
B are honest and have always executed the protocol correctly. The requirement 
that A and B are honest is certainly necessary for the provision of any service by 
a key establishment protocol: no key establishment protocol can protect against 
a dishonest entity who chooses to reveal the session key... just as no encryption 
scheme can guard against an entity who chooses to reveal confidential data. 

Efficiency. The work done by each entity is dominated by the time to perform 
the modular exponentiations. The total number of modular exponentiations per 
entity for the KEA, Unified Model, and MQV AK protocols is 4, 4, and 3.5, 
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respectively. If precomputations (of quantities involving the entity’s static and 
ephemeral keys and the other entity’s static keys) are discounted, then the total 
number of on-line modular exponentiations per entity reduces to 2, 2, and 2.5, 
respectively. 

As noted in Q MACs can be computed efficiently and hence the AKC vari- 
ants have essentially the same computational overhead as their AK counterparts. 
They do, however, require an extra flow. 



7 Provable security 

This section discusses methods that have been used to formally analyze key 
agreement protocols. The goal of these methods is to facilitate the design of 
secure protocols that avoid subtle flaws like those described in Q We examine 
two approaches, provable security and formal methods, focusing on the former. 

Provable security was invented in the 1980’s and applied to encryption schemes 
and signature schemes. The process of proving security of a protocol comes in 
five stages: 

1. Specification of model. 

2. Definition of goals within this model. 

3. Statement of assumptions. 

4. Description of protocols. 

5. Proof that the protocol meets its goals within the model. 

As discussed in 9 the emphasis of work in provable security of a protocol 
should be how appropriate the model, definitions, and underlying assumptions 
are, rather than the mere statement that a protocol attains provable security — 
after all, all protocols are provably secure in some model, under some definitions, 
or under some assumptions. 

History of provable security. Building on earlier informal work of Bird et al. 
13 for the symmetric setting and Diffie, van Oorschot and Wiener 31 for the 
asymmetric setting, Bellare and Rogaway Q provided a model of distributed 
computing and rigorous security definitions, proposed concrete two-party au- 
thenticated key transport protocols in the symmetric setting, and proved them 
secure under the assumption that a pseudorandom function family exists. They 
then extended the model to handle the three-party (Kerberos) case ^ 1 ; see also 
Shoup and Rubin ^3 for an extension of this work to the smart card world. 
Blake-Wilson and Menezes ^3 Blake-Wilson, Johnson and Menezes ^3 
extended the Bellare-Rogaway model to the asymmetric setting, and proposed 
and proved the security of some authenticated key transport, AK, and AKC 
protocols (see ^3- More recently, Bellare, Canetti and Krawczyk Q provided 
a systematic method for transforming authentication protocols that are secure 
in a model of idealized authenticated communications into protocols that are 
secure against active attacks; their work is discussed further in ^3 
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Formal methods. These are methods for analyzing cryptographic protocols 
in which the communications system is described using a formal specification 
language which has some mathematical basis, from which security properties of 
the protocol can be inferred. (See ^3 and ^3 for surveys on formal methods.) 
The most widely used of these methods are those related to the BAN logic of 
Burrows, Abadi and Needham which was extended by van Oorschot 
to enable the formal analysis of authenticated key agreement protocols in the 
asymmetric setting. Such methods begin with a set of beliefs for the participants 
and use logical inference rules to derive a belief that the protocol goals have been 
obtained. 

Such formal methods have been useful in uncovering flaws and redundan- 
cies in protocols. However, they suffer from a number of shortcomings when 
considered as tools for designing high-assurance protocols. First, a proof that a 
protocol is logically correct does not imply that it is secure. This is especially 
the case because the process of converting a protocol into a formal specification 
may itself be subject to subtle flaws. Second, there is no clear security model 
associated with the formal systems used and thus it is hard to assess whether 
the implied threat model corresponds with the requirements of an application. 
Therefore, we believe that provable security techniques offer greater assurance 
than formal methods and we focus on provable security for the remainder of this 
section. 



7.1 Bellare-Rogaway model of distributed computing 

Work on the design of provably secure authenticated key agreement has largely 
focused on the Bellare-Rogaway model of distributed computing 

The Bellare-Rogaway model, depicted in Figure H is a formal model of com- 
munication over an open network in which the adversary E is afforded enormous 
power. She controls all communication between entities, and can at any time ask 
an entity to reveal its static private key. Furthermore, she may at any time ini- 
tiate sessions between any two entities, engage in multiple sessions with the 
same entity at the same time, and ask an entity to enter a session with itself. We 
provide an informal description of the Bellare-Rogaway model, and informal def- 
initions of the goals of secure AK and AKC protocols. For complete descriptions, 
see 

In the model, E is equipped with a collection of iT^ g oracles. 77^ g models 
entity A who believe she is communicating with entity B for the s**' time. E is 
allowed to make three types of queries of its oracles: 

Send(iT^,s, x): E gives a particular oracle x as input and learns the 
oracle’s response. 

Reveal(iT^.s): E learns the session key (if any) the oracle currently holds. 

Corrupt (A): E learns A’s static private key. 

When E asks an oracle a query, the oracle computes its response using the de- 
scription of the protocol. Security goals are defined in the context of running E 
in the presence of these oracles. 
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Secure key agreement is now captured by a test involving an additional Test 
query. At the end of its experiment, E selects a fresh oracle II \ ^ — this is 
an oracle which has accepted a session key k, and where the adversary has not 
learned k by trivial means (either by corrupting A or B, or by issuing a Reveal 
query to 77^ ^ or to any 77^ ^ oracle which has had a matching conversation 
with 77^ g) — and asks it Test query. The oracle replies with either its session 
key fc or a random key, and the adversary’s job is to decide which key it has 
been given. 




Fig. 9. The Bellare-Rogaway model of distributed computing. 



Definition 1 ( | » (Informal) An AK protocol is secure if: 

(i) The protocol successfully distributes keys in the absence of an adversary. 

(ii) No adversary E can distinguish a session key held by a fresh 77^ g oracle 
from a key selected uniformly at random. 

A secure AKC protocol is defined by amalgamating the notion of entity 
authentication with the notion of a secure AK protocol. 

Definition 2 ( | '-H ). (Informal) An AKC protocol is secure if in addition to 
conditions (i) and (ii) of Definition^ 

(Hi) The only way an adversary E can induce a IIj^ g oracle to accept a session 
key is by honestly transmitting messages between 77^ g and some II g 

The security of the Unified Model (Protocol 5) and the Unified Model with 
key confirmation (Protocol 8) in the Bellare-Rogaway model was proven under 
certain assumptions in 

Theorem 1 (^3). Protocol 5 is a secure AK protocol in the Bellare-Rogaway 
model provided that: 

(i) the adversary makes no Reveal queries; 
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(ii) the Diffie-Hellman problem is hard; and 
(Hi) H is a random oracle. 



Theorem 2 ( | ■ | ). Protocol 8 is a secure AKC protocol in the Bellare-Rogaway 
model provided that: 

(i) the Diffie-Hellman problem is hard; 

(ii) the MAC is secure; and 

(Hi) Hi and H 2 are independent random oracles. 

A random oracle is a ‘black-box’ random function which is supplied to all en- 
tities, including the adversary. The assumption that H, Hi and H 2 are random 
oracles is a very powerful one and facilitates security analysis. This so-called 
random oracle model was introduced and popularized by Bellare and Rogaway 
n . In practice, the random oracles can be instantiated with hash functions — 
therefore the security proofs in the random model are no longer valid in the prac- 
tical implementation. Nonetheless, and despite recent results demonstrating the 
limitations of the random oracle model ^ is a thesis that protocols proven 
secure in the random oracle provide higher security assurances than protocols 
deemed secure by ad-hoc means. 

To see that Protocol 5 is not a secure AK protocol in the Bellare-Rogaway 
model if the adversary is allowed to make Reveal queries, consider the following 
interleaving/reflection attack. Suppose that A initiates 2 runs of the protocol; let 
A’s ephemeral public keys be and g^ in the first and second runs, respectively. 
The adversary E then replays g^ and to A in the first and second rounds 
respectively, purportedly as B’s ephemeral public keys. A computes both session 
keys as fc = H{g°‘^\\g^^). E can now Reveal one session key, and thus also learn 
the other. 

It is conjectured in Q that the modification of Protocol 5 in which the 
session key is formed as fc = H{g°‘y\\g^^) is a secure AK protocol assuming only 
that the Diffie-Hellman problem is hard and that H is a random oracle. 



7.2 A modular approach 

Recently, Bellare, Canetti and Krawczyk Q have suggested an approach to the 
design of provably secure key agreement protocols that differs from the Bellare- 
Rogaway model. Their approach is a modular approach and starts with protocols 
that are secure in a model of idealized authenticated communication and then 
systematically transforms them into protocols which are secure in the realistic 
unauthenticated setting. This approach has the advantage that a new proof 
of security is not required for each protocol — instead once the approach is 
justified it can be applied to any protocol that works in the ideal model. On 
the other hand, it is less clear what practical guarantees are provided so the 
evaluation of whether the guarantees are appropriate in an application is perhaps 
less understood. The following is an informal overview of their approach. 



358 



Simon Blake-Wilson and Alfred Menezes 



Authenticators. Authenticators are key to the systematic transformations at 
the heart of the modular approach. They are compilers that take as input a pro- 
tocol designed for authenticated networks, and transforms it into an ‘equivalent’ 
protocol for unauthenticated networks. The notion of equivalence or emulation 
is formalized as follows. A protocol P' designed for unauthenticated networks 
is said to emulate a protocol P designed for authenticated networks, if for each 
adversary E' of P' there exists an adversary E of P such that for all inputs 
X, the views Vp^e{x) and Vpv_e'{x) are computationally indistinguishable. (The 
view Vp^e{x) of a protocol P which is run on input x in the presence of an 
adversary E is the random variable describing the cumulative outputs of E and 
all the legitimate entities.) 

MT- Authenticators. In Q, authenticators are realized using the simpler idea 
of an MT- authenticator which emulates the most straightforward message trans- 
mission (MT) protocol in which a single message is passed from A to B as de- 
picted in Figure^J Figure^Jillustrates the protocol Asig which is proven in Q 
to be an MT-authenticator. In the figure, sign^() denotes A’s signature using 
a signature scheme that is secure against chosen message attacks (e.g., 

Now an MT-authenticator A can be used to construct a compiler C\ as follows: 
given a protocol P, P' = C\{P) is the protocol obtained by applying A to each 
message transmitted by P. It is proven in ^ that C\ is indeed an authenticator. 



Fig. 10. Message transmission protocol (MT). 



A 



m 

Nb 

sign^im, Nb,B) 



B 

Nb €r {0, 1}'= 



Fig. 11. MT-authenticator Asii 



Key establishment. Finally, this MT-authenticator is used to build a secure 
authenticated key agreement protocol. It is first shown in | that ephemeral 
DifRe-Hellman EDH (Protocol 1) is a secure key establishment protocol for au- 
thenticated networks by showing that it emulates traditional face-to-face key 
establishment as described in Q Then, EDH is emulated using The result 
CAsig(PPP) is a secure six-pass authenticated key agreement protocol. Combin- 
ing messages from different flows, and replacing the challenges Na and Nb with 
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the ephemeral public keys and g^, respectively, yields the three-pass BCK 
protocol, depicted in Figure^] 

The BCK protocol is similar to Key Agreement Mechanism 7 in ISO/IEC 
11770-3 ^ 3 . In the latter, the MACs of the signatures under the shared se- 
cret K = g^y are also included in flows 2 and 3, thus providing explicit key 
confirmation, instead of just implicit key confirmation as provided by the BCK 
protocol. 



9 



X 



9^,signg{gy,g^,A) 

siga^{g^,gy,B) 



B 

K = g^^ 



Fig. 12. BCK protocol. 



8 Conclusions and future work 

This paper surveyed practical and provable security aspects of some authen- 
ticated DifRe-Hellman key agreement protocols that are being considered for 
standardization. 

A number of questions can be asked. Can the MQV protocol be proven secure 
in a reasonable model of computing? Are the definitions of secure AK and AKC 
protocols in i^Jthe right ones? How do the models and security definitions pre- 
sented in Are the security proofs meaningful in practice? 

That is, can the reductions used in the proofs be untilized to obtain meaning- 
ful measures of exact security ^3? {Exact security is a concrete quantification 
of the security guaranteed by a protocol in terms of the perceived security of 
the underlying cryptographic primitives, e.g., the Diffie-Hellman problem or a 
secure MAC algorithm.) 

Two important tasks that remain are to devise a provably secure two-pass 
AK protocol, and to provide formal definitions for secure one-pass key agreement 
protocols. 
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Abstract. Skipjack is the secret key encryption algorithm developed by 
the NSA for the Clipper chip and Fortezza PC card. It uses an 80-bit 
key, 128 table lookup operations, and 320 XOR operations to map a 64- 
bit plaintext into a 64-bit ciphertext in 32 rounds. This paper describes 
an efhcient attack on a variant, which we call Skipjack-SXOR (Skipjack 
minus 3 XORs). The only difference between Skipjack and Skipjack- 
3XOR is the removal of 3 out of the 320 XOR operations. The attack uses 
the ciphertexts derived from about 500 plaintexts and its total running 
time is equivalent to about one million Skipjack encryptions, which can 
be carried out in seconds on a personal computer. We also present a new 
cryptographic tool, which we call the Yoyo game, and efficient attacks 
on Skipjack reduced to 16 rounds. We conclude that Skipjack does not 
have a conservative design with a large margin of safety. 

Key words: Cryptanalysis, Skipjack, Yoyo Game, Clipper chip, Fortezza 
PC card. 



1 Introduction 

Skipjack is the secret key encryption algorithm developed by the NSA for the 
Clipper chip and Fortezza PC card. It was implemented in tamper-resistant 
hardware and its structure was kept secret since its introduction in 1993. 

S. Tavares and H. Meijer (Eds.): SAC’98, LNCS 1556, pp. 362-^^| 1999. 
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To increase confidence in the strength of Skipjack and the Clipper chip initia- 
tive, five well known cryptographers were assigned in 1993 to analyze Skipjack 
and report their findings [4]. They investigated the strength of Skipjack using 
differential cryptanalysis [3] and other methods, and concentrated on reviewing 
NSA’s design and evaluation process. They reported that Skipjack is a “repre- 
sentative of a family of encryption algorithms developed in 1980 as part of the 
NSA suite of “Type I” algorithms, suitable for protecting all levels of classified 
data. The specific algorithm, SKIPJACK, is intended to be used with sensitive 
but unclassified information.” They concluded that “Skipjack is based on some 
of NSA’s best technology” and quoted the head of the NSA evaluation team who 
confidently concluded “I believe that Skipjack can only be broken by brute force 
- there is no better way.” 

On June 24th, 1998, Skipjack was declassified, and its description was made 
public in the web site of NIST [7] . It uses an 80-bit key, 32 • 4 = 128 table lookup 
operations, and 32 • 10 = 320 XOR operations to map a 64-bit plaintext into a 
64-bit ciphertext in 32 rounds. 

This paper summarizes our initial analysis. We study the differential [3] and 
linear [6] properties of Skipjack, together with other observations on the design 
of Skipjack. Then, we use these observations to present a differential attack on 
Skipjack reduced to 16 rounds, using about 2^^ chosen plaintexts and steps of 
analysis. Some of these results are based on important observations communi- 
cated to us by David Wagner [8] . 

We present a new cryptographic tool, which we call the Yoyo game, applied to 
Skipjack reduced to 16 rounds. This tool can be used to identify pairs satisfying 
a certain property, and be used as a tool for attacking Skipjack reduced to 16 
rounds using only 2^"^ adaptive chosen plaintexts and ciphertexts and 2^'^ steps 
of analysis. This tool can also be used as a distinguisher to decide whether a 
given black box contains this variant of Skipjack, or a random permutation. 

We then present the main result of this paper, which is an exceptionally 
simple attack on a 32-round variant, which we call Skipjack-3XOR (Skipjack 
minus 3 XORs). The only difference between the actual Skipjack and Skipjack- 
3XOR is the removal of 3 out of the 320 XOR operations. The attack uses 
the ciphertexts derived from about 500 plaintexts which are identical except 
for the second 16 bit word. Its total running time is equivalent to about one 
million Skipjack encryptions, which can be carried out in seconds on a personal 
computer. We thus believe that Skipjack does not have a conservative design 
with a large margin of safety. 

This paper is organized as follows: In Section ^we describe the structure of 
Skipjack, and the main variants that we analyze in this paper. In Section^we 
present useful observations on the design, which we later use in our analysis. In 
Sectionjwe describe a differential attack on a 16-round variant of Skipjack. The 
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Yoyo game and its applications are described in Section H Finally, in Section | 
we present our main attack on Skipjack-3XOR. 



2 Description of Skipjack 

The published description of Skipjack characterizes the rounds as either Rule 
A or Rule B. Each round is described in the form of a linear feedback shift 
register with an additional non linear keyed G permutation. Rule B is basically 
the inverse of Rule A with minor positioning differences. Skipjack applies eight 
rounds of Rule A, followed by eight rounds of Rule B, followed by another eight 
rounds of Rule A, followed by another eight rounds of Rule B. The original 
definitions of Rule A and Rule B are given in Figure B where counter is the 



Rule A 


Rule B 


= G*^{w1) © W4 © counter‘d 


= wt 


= G^{w^) 




wj+i = 


= Wl © 102 © counter*^ 


= wl 


w\+^ = wl 



Fig. 1 Rule A and Rule B. 



round number (in the range 1 to 32), G is a four-round Feistel permutation 
whose F function is defined as an 8x8-bit S box, called F Table, and each round 
of G is keyed by eight bits of the key. The key scheduling of Skipjack takes a 
10-byte key, and uses four of them at a time to key each G permutation. The 
first four bytes are used to key the first G permutation, and each additional G 
permutation is keyed by the next four bytes cyclically. 

The description becomes simpler (and the software implementation becomes 
more efficient) if we unroll the rounds, and keep the four elements in the shift 
register stationary. In this form the code is simply a sequence of alternate G 
operations and XOR operations of cyclically adjacent elements. In this represen- 
tation the main difference between Rule A and Rule B is the direction in which 
the adjacent elements are XORed (left to right or right to left). 

The XOR operations of Rule A and Rule B after round 8 and after round 24 
(on the borders between Rule A and Rule B) are consecutive without application 
of the G permutation in between. In the unrolled description these XORs are of 
the form 



W2 = G(W2, subkey) 



— Rule A 
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Wl = Wl®W2®8 

W2 = W2eWl®9 — Rule B 

IVl = G(W1, subkey) 

which is equivalent to exchanging the words W1 and W2, and leaving W2 as 
the original 14^1 0 1 : 

W2 = G{W2, subkey) 

exchange W1 and W2 
W1 = W1®W2®8 
W2 = W2®1 
14^1 = G{W1, subkey) 

(the same situation occurs after round 24 with the round numbers 8 and 9 
replaced by 24 and 25). Figure^ describes this representation of Skipjack (only 
the first 16 rounds out of the 32 are listed; the next 16 rounds are identical 
except for the counter values). 

Also, on the border between Rule B and Rule A (after round 16), there are 
two parallel applications of the G permutation on two different words, with no 
other linear mixing in between. 

Note that Rule A mixes the output of the G permutation into the input of 
the next G permutation, while Rule B mixes the input of a G permutation into 
the output of the previous G permutation (similarly in decryption of Rule A), 
and thus during encryption Rule B rounds add little to the avalanche effect, and 
during decryption Rule A rounds add little to the avalanche effect. 

In this paper we consider variants of Skipjack which are identical to the 
original version except for the removal of a few XOR operations. We use the name 
Skipjack- (ii, .. .,ik) to denote the variant in which the XOR operations mixing 
two data words at rounds ii, . . .,ik are removed, and the name Skipjack-SXOR 
as a more mnemonic name for Skipjack-(4,16,17), which is the main variant we 
attack. Note that the removal of these XOR operations does not remove the 
effect of any other operation (as could happen if we removed the XORs of the 
Feistel structure of G, which would eliminate the effect of the corresponding F 
tables) . 



3 Useful Observations 

3.1 Observations Regarding the Key Schedule 



The key schedule is cyclic in the sense that the same set of four bytes of the 
subkeys (entering a single G permutation) are repeated every five rounds, and 
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Fig. 2 Skipjack. 
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there are only five such sets. In addition, the key bytes are divided into two sets: 
the even bytes and the odd bytes. The even bytes always enter the even rounds 
of the G permutation, while the odd bytes always enter the odd rounds of the 
G permutation. 



3.2 Decryption 

As in most symmetric ciphers, decryption can be done using encryption with mi- 
nor modifications. These modifications are (1) reordering the key bytes to K* = 
{cvr, cvq, cvo, cvg, cvs), (2) reversing the order of the round counters, and then 
(3) encrypting the reordered ciphertext C* = (cbg, cbg, cbi, cbo, cbr, cbg, cb 5 , cb 4 ) 
gives the reordered plaintext P* = (pbg, pb 2 , pbi, pbg, pby, pbg, pb^, pb 4 ). 

The mixings with the round numbers (counters) are often used to protect 
against related key attacks. In Skipjack, if these mixings are removed, the fol- 
lowing stronger property would hold: Given a plaintext P = (pbo,pbi, ...,pbj), 
a key K = (cvg, ■■■, cvg) and a ciphertext C = (cbg, ■■■, cby) such that C = 
Skipjack;^(P), then decryption can be performed using encryption by P* = 
Skipjack;^. (C*), where K* = {cvr, cvq, cvg, cvg, cvs), P* = {pbg,pb 2 ,pbi,pbg, 
pb 7 ,pbg,pb 5 ,pb 4 ), and C* = {cbg, 062, cbi,cbg, cby, cbg, cb^, 064). 

This property could be used to reduce the complexity of exhaustive search of 
this Skipjack variant by a factor of almost 2 (26% of the key space rather than 
50% in average) in a similar way to the complementation property of DES: Given 
the encrypted ciphertext Cl of some plaintext P, and the decrypted plaintext 
C2 of the related P* under the same unknown key, perform trial encryptions 
with 60% of the keys K (three keys of each cycle of 5 keys of the rotation by two 
key bytes operations; efficient implementations first try two keys of each cycle, 
and only if all of them fail, they try the third keys of the cycles). For each of 
these keys compare the ciphertext to Cl, and to C2* (i.e., C2 in which the bytes 
are reordered as above). If the comparison fails, the unknown key is neither K 
nor K* . If it succeeds, we make two or three trial encryptions, and in case they 
succeed we found the key. 



3.3 Complementation Properties of the G Permutation 

The G permutation has 2^®— 1 complementation properties: Let Gko,ki,K 2 ,K 3 { 
xl, x2) = (yl, y2), where K0,K1, K2, K3, xl, x2, yl, y2 are all byte values, and 
let dl, d2 be two byte values. Then, 

C_R-0©dl,/Cl©d2,iC2©dl, K3(Bd2(.xl © d2, x2 ® dl) = (yl ® d2, y2 ® dl). 

G has exactly one fixpoint for every subkey (this was identified by Frank 
Gifford, and described in sci. crypt). Moreover, we observed that for every key 
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and every value v of the form (0, h) or (5, 0) where 0 is a zero byte and b is an 
arbitrary byte value, G has exactly one value x for which G{x) = x (B v. It is 
unknown whether this property can aid in the analysis of Skipjack. 



3.4 Differential and Linear Properties of the F Table 



We generated the differential and linear distribution tables of the F table, and 
found that in the difference distribution table: 

1. The maximal entry is 12 (while the average is 1). 

2. 39.9% of the entries have non-zero values. 

3. The value 0 appears in 39360 entries, 2 in 20559, 4 in 4855, 6 in 686, 8 in 
69, 10 in 5, and 12 in 2 entries. 

4. One-bit to one-bit differences are possible, such as Ola, — > 01a, (where the 
subscript x denotes a hexadecimal representation) with probability 2/256. 

In the linear approximation table: 

1. The maximal biases are 28 and —28 (i.e., probabilities of 1/2 -|- 28/256 and 
1/2 - 28/256). 

2. 89.3% of the entries have non-zero values. 

3. The absolute value of the bias is 0 in 7005 entries, 2 in 12456, 4 in 11244, 6 
in 9799, 8 in 7882, 10 in 6032, 12 in 4354, 14 in 2813, 16 in 1814, 18 in 1041, 
20 in 567, 22 in 317, 24 in 154, 26 in 54, and 28 in 3 entries. 

4. Unbalanced one-bit to one-bit linear approximations exist, such as 80a, — > 80a, 
with probability 1/2-1- 20/256. 



3.5 Differential and Linear Properties of the G Permutation 



Consider the F table, and let a and b be two byte values such that both a ^ b 
and b ^ a occur with a non-zero probability. We can prove that the best pos- 
sible characteristic of G must be of the form: input difference: (a, 0), output 
differences: (0,&), with the intermediate differences (a, 0) ^ (a, 0) ^ {a,b) 

(0, b) (0, b). There are 10778 pairs of such a and b, of which four have proba- 
bility 48/216 = 2-10-42. They are 



1.0 = 52a;, b = /5a;, 

2. a = /5a;, b = 52a;, 

3.0 = 77a;, b — 92a;, and 
4. o = 92a;, b = 77a;. 
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Most other characteristics of this form have probability (6672 pairs) and 
(3088 pairs). The remaining characteristics of this form have probabilities 
between 2~^^ and 

Given a and b, there are additional characteristics with three active F tables 
(rather than only two), and for the above values of a and b the probabilities are 
between and These characteristics of G are (0,6) ^ (a, 6) and 

(a, 6) ^ (a, 0). We can combine these characteristics with the characteristics of 
the previous form and get cycles of three characteristics which have the form 
(a, 0) ^ (0, 6) ^ (a, 6) ^ (a, 0). 

We studied the differential corresponding to these characteristics, and com- 
puted their exact probabilities by summing up the probabilities of all the char- 
acteristics with the same external differences. We found that the characteristic 
(a, 0) ^ (0, 6) has the same probability as a differential and as a character- 
istic, as there are no other characteristics with the same external differences. 
(0, 6) ^ (a, 6) and (a, 6) ^ (a, 0) with the same a and 6 as above have over a thou- 
sand small-probability counterparts with the same external differences, whose to- 
tal probability is slightly smaller than the probabilities of the original characteris- 
tics. Thus, the probability of the differentials are almost twice that of the original 
characteristics (e.g., 137088/2®2 = instead of 73728/2^^ = in one 

of the cases) . 

We had also investigated other differentials of G. The characteristics we de- 
scribed with probability of around 2“^®'^^ (and other lower probability charac- 
teristics with zero differences in the first and fourth rounds of the G permutation) 
do not have any counterparts, and thus the corresponding differentials have the 
same probabilities as the characteristics. The best other differential we are aware 
of is 002Ax 0095a; with probability 2“^^-^^®, and the best possible differential 

with the same input and output differences is 7F7Fx 7F7F^ with probability 

2 - 15.84 

We next consider the case of linear cryptanalysis. As the characteristics are 
built in a similar way where XORs are replaced by duplications and duplications 
are replaced by XORs of the subsets of parity bits[l], we can apply the same 
technique for linear cryptanalysis. In this case we have 52736 possible pairs of 
a and 6. The best linear characteristic of G is based on a = 6 = 60a, and its 
probability is 1/2 -t- 2 • 676/2i® = 1/2-1- 2"® ®. 

It is interesting to note that (due to its design) many criteria used in other 
ciphers are neither relevant to nor used in Skipjack. For example, a difference 
of one input bit in a DES S box cannot cause a difference of only one bit in its 
output, but there are many such instances in the F table of Skipjack. 

Another observation is that due to the switchover from Rule A to Rule B 
iterations, the data between rounds 5 and 12 is very badly mixed. As mentioned 
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earlier, on the border between the two rules (after rounds 8 and 24), the leftmost 
word is exchanged with word 2, and the new word 1 is XORed with the new 
word 2. We observed that the output of the G permutation in round 5 becomes 
the input to the G permutation in round 12, unaffected by other words (but 
XORed with the fixed value 8 © 9 = 1). Thus, this word is not affected by any 
other word during 8 consecutive rounds. A similar property occurs in word 3 
from round 7 to round 11, and in word 4 from round 6 to round 10. On the 
other hand, from round 5 to round 12 word 2 (renamed later to word 1) is 
affected several times by the other words, and the G permutation is applied to 
it several times, but it does not affect other words. Moreover, from round 13 
to round 16, this word affects directly or indirectly only two of the three other 
words, and therefore, the input of the second word in round 5 never affects the 
fourth data word twelve rounds later| 



4 Cryptanalysis of Skipjack Reduced to 16 Rounds 

4.1 Differential Cryptanalysis of Skipjack with Reduced Number of 
Rounds 



The differential attack we describe here for 16-round Skipjack is considerably 
faster than exhaustive search. This attack is based on our original attack [2] 
with additional improvements based on Wagner’s observations [8]. 

The best characteristics of 16-round Skipjack that we are aware of use the 
characteristics of the G permutation described above. The plaintext difference is 
(a, 0, a, 0, 0, 0, 0, b) (where a, b and 0 are eight-bit values, and a, b are the values 
described in Section^3 and only six active G permutations (in which there are 
a total of 14 active F tables) are required to achieve the ciphertext difference 
(0, 6, 0, 5, a, 0, 0, 0). There are four such characteristics with probabilities about 
2-72.9^ When we replace the characteristics by the corresponding differentials 
of G, the probability grows to about 2“^^. However, when we view the two G 
permutations in rounds 8 and 9 (unaffected by differences from other words) as 
one new permutation, its probability is about 2“^®, and thus the probability of 
the differential grows to about 2“^®. 

Given the ciphertexts of many plaintext pairs with the difference (a, 0, a, 0, 
0, 0, 0, b), it is easy to identify and discard most of the wrong pairs in a OR-attack. 
Such an attack requires about 2®° pairs. We observe that only a four-round 
characteristic of the first four rounds is required, with probability about 2“^^, 
and that when the characteristic holds, the truncated (word-wise) differences in 
rounds 5-16 are fixed. In this case we choose about 2^^ chosen plaintext pairs, 

^ This property was found by Wagner[8]. 
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and can discard most of the wrong pairs, except for a fraction of 2 of them. 
Thus, about 2® = 64 pairs remain. 

Now we use a second observation that the same set of subkeys is used in the 
first and the 16th rounds. We try all the 2®^ possible sets of subkeys and for each 
remaining pair we encrypt the first round and verify that the characteristic of 
G holds, and decrypt the last round and verify whether the expected difference 
(i.e., the difference of the third ciphertext word) holds in the input of the last G 
permutation. The probability that a wrong set of subkeys does not discard a pair 
is 2“^® • 2“^®-^ = 2“^®-^, and thus only the correct 32-bit subkey is expected to 
be proposed twice, by two different remaining pairs, and thus can be identified. 
This attack can be applied efficiently in 2^® steps for each analyzed pair, i.e., a 
total complexity of 2^^ steps. Similar techniques (or even exhaustive search of 
the remaining 48 bits of the key) can complete the cryptanalysis. 



4.2 Linear Cryptanalysis of Skipjack with Reduced Number of 
Rounds 



Linear characteristics are built in a similar way where XORs are replaced by du- 
plications and duplications are replaced by XORs of the subsets of parity bits[l]. 
As Rule A and Rule B differ essentially in this way, we can have similar analysis 
for linear cryptanalysis (except that we use linear characteristics rather than 
differentials) . The probability of the best linear characteristic we found is about 
1/2-1- 2“®®-®, and thus the attack seems to require more known plaintexts than 
the total number of possible plaintexts. However, this number can be reduced 
below 2®^ by using shorter characteristics. 



4.3 Modified Variants of Skipjack 



Skipjack uses alternately eight rounds of Rule A and eight rounds of Rule B. 
In this section we investigate whether other mixing orders strengthen or weaken 
the cipher. A simple example of a modified design uses alternately four ‘Rule A’ 
rounds and four ‘Rule B’ rounds. We found an attack on this 16-round cipher 
which requires only about 2^® chosen plaintexts and about 2®^ steps of analysis 
to find the subkey of round 3. 

When Rule A rounds and Rule B rounds appear in reverse order (i.e.. Rule 
B is applied first), and four rounds of each are applied consecutively, then only 
two pairs are required to find the last subkey. 

These few examples indicate that the order of Rule A and Rule B rounds can 
have a major impact on the security of modified variants of Skipjack. Further 
study of modified variants will shed more light on Skipjack’s design principles. 
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5 A New Cryptographic Tool: The Yoyo Game 



Consider the first 16 rounds of Skipjack, and consider pairs of plaintexts P = 
{wi,W 2 , W 3 , W 4 ) and P* = {wl,W 2 ,w^, to|) whose partial encryptions differ only 
in the second word in the input of round 5 (we will refer to it as the property 
from now on). As this word does not affect any other word until it becomes 
word 1 in round 12, the other three words have difference zero between rounds 5 
and 12. 

We next observe that given a pair with such a property, we can exchange the 
second words of the plaintexts (which cannot be equal if the property holds), 
and the new pair of plaintexts (rci, w's, ^^ 4 ) and (to*, 102 , still satisfies 

the property, i.e., differs only in the second word in the input of round 5. Given 
the ciphertexts we can carry out a similar operation of exchanging words 1. 

The Yoyo game starts by choosing an arbitrary pair of distinct plaintexts Pq 
and Pq. The plaintexts are encrypted to Cq and Cq. We exchange the first words 
of the two ciphertexts as described above, receiving Ci and C*, and decrypt them 
to get Pi, P*. Now we exchange the second words of the plaintexts, receiving 
P 2 and P 2 , and encrypt them to get C 2 and C^- The Yoyo game repeats this 
forever. 

In this game, whenever we start with a pair of plaintexts which satisfies the 
property, all the resultant pairs of encryptions must also satisfy the property, 
and if we start with a pair of plaintexts which does not satisfy the property, all 
the resultant encryptions cannot satisfy it. 

It is easy to identify whether the pairs in a Yoyo game satisfy the above 
property, by verifying whether some of the pairs achieved in the game have a 
non-zero difference in the third word of the plaintexts or in the fourth word of 
the ciphertexts. If one of these differences is non-zero, the pair cannot satisfy 
the property. On the other hand, if the pair does not satisfy the property, there 
is only a probability of 2“^® that the next pair in the game has difference zero, 
and thus it is possible to stops games in which the property is not satisfied after 
only a few steps. If the game is not stopped within a few steps, we conclude with 
overwhelming probability that the property is satisfied. 

This game can be used for several purposes. The first is to identify whether 
a given pair satisfies the above property, and to generate many additional pairs 
satisfying the property. 

This can be used to attack Skipjack reduced to 16 rounds in just 2^^ steps. For 
the sake of simplicity, we describe a suboptimal implementation with complexity 
2^^. In this version we choose 2^^ plaintexts whose third word is fixed. This set 
of plaintexts defines about 2^^ possible pairs, of which about 2^^ candidate pairs 
have difference zero in the fourth word of the ciphertexts, and of which about 
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one or two pairs are expected to satisfy the property. Up to this point, this 
attack is similar to Wagner’s attack on 16-round Skipjack [8]. We then use the 
Yoyo game to reduce the complexity of analysis considerably. We play the game 
for each of the 2^^ candidate pairs, and within a few steps of the game discard 
all the pairs which do not satisfy the property. We are left with one pair which 
satisfies the property, and with several additional pairs generated during the 
Yoyo game which also satisfy the property. Using two or three of these pairs, we 
can analyze the last round of the cipher and find the unique subkey of the last 
round that satisfies all the requirements with complexity about 2^®. The rest of 
the key bytes can be found by similar techniques. 

This game can also be used as a distinguisher which can decide whether an 
unknown encryption algorithm (given as an oracle) is Skipjack reduced to 16 
rounds or a random permutation. 

The above Yoyo game keeps three words with difference zero in each pair. 
We note that there is another (less useful) Yoyo game for Skipjack reduced 
to 14 rounds (specifically, rounds 2 to 15), which keeps only one word with 
difference zero. Consider pairs of encryptions P = (wi,W 2 ,W 3 ,W 4 ) and P* — 
(w*,W 2 ,u) 3 ,W 4 ) which have the same data at the leftmost word in the input 
of round 5. As this word is not affected by any other word until it becomes 
word 2 in round 12, we can conclude that both encryptions have the same data 
in word 2 after round 12. Given a pair with such an equality in the data, we 
can exchange the first word of the plaintexts, and the new pair of plaintexts 
(w*,W2, u>3, W4) and (wi,W2, 103,104) still has the same property of equality at the 
input of round 5. Moreover, if the first words of the plaintexts are equal (i.e., tci = 
10* and thus exchanging them does nothing) we can exchange the second words 
(102 with 102) and get the same property. If they are also equal, we can exchange 
103 with tCg and get the same property. If they are also equal, we exchange 104 
with 104. However, if the property holds, this last case is impossible, as at least 
two words of the two plaintexts must be different. Given the ciphertexts we 
can carry out a similar operation of exchanging words 2. If words 2 are equal, 
exchange words 1, then words 4, and then words 3. Also in this case a difference 
of only one word ensures that the property is not satisfied. This Yoyo game is 
similar to the previous game, except for its modified exchange process, and it 
behaves similarly with respect to the new difference property. 



6 Cryptanalysis of Skipjack-3XOR 

In this section we analyze Skipjack-3XOR, which is identical to the original 32- 
round Skipjack except for the removal of the three XOR operations which mix 
16-bit data words with their neighbors at rounds 4, 16 and 17. We show that 
this version is completely insecure, since it can be broken in one million steps 
using only about 500 chosen plaintexts. 
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The starting point of the attack is Wagner’s observation [8] that the differen- 
tial characteristic we used in the previous section can use truncated (i.e., word- 
wise) differences [5]. The attack uses the following characteristic of Skipjack- 
3XOR: For any 16-bit non-zero value a, the plaintext difference (0, a, 0, 0) leads 
to the difference (6, c, 0, 0) after round 16 with probability 1, which in turn leads 
to a difference {d, 0, 0, 0) after round 28 with probability 2“^®, for some unspeci- 
fied non-zero values b, c, and d. This difference leads to some difference (e, /, g, 0) 
in the ciphertexts, for some e, /, and g. 

The attack requires two pairs of plaintexts with such a differential behavior. 
To get them, encrypt 2® = 512 distinct plaintexts which are identical except at 
their second word. They give rise to about 2^®/2 = 2^^ pairs, and each pair has 
the required property with probability 2“^®. The two right pairs can be easily 
recognized since the two ciphertexts in each pair must be equal in their last 16 
bits. 

The basic steps of the attack are: 

1. We know the input differences and the actual outputs of the 32nd G per- 
mutation. Each right pair yields a subset of about 2^® possible key bytes 
CV 4 ,. . . ,cvr, and the intersection of the two subsets is likely to define these 32 
key bits (almost) uniquely. This part can be implemented in about 2^® eval- 
uations of G. 

2. The 29th G permutation shares two key bytes CV 4 , cv^ with the 32nd G 
permutation, which are already known. 2^® possible combinations of the two 
key bytes CV 2 , CV 3 and the inputs to the 30th G permutation in both pairs can 
be found. A careful implementation of this step requires a time complexity 
which is equivalent to 2^^ evaluations of G. 

3. For each of the 2^® combinations we still miss the key bytes cvs, cvg entering 
the last two F tables in round 30, and the key bytes cvq and cvi entering the 
first two F tables in round 31. Together they are equivalent to a single G, 
which we call G’. In each right pair, the two encryptions have the same values 
in G’. We view both right pairs as a super pair of two G’ evaluations, whose 
actual inputs and outputs are known. The analysis of G’ takes about the 
equivalent of 2® G evaluations, and thus the total complexity is equivalent 
to about 2®® G evaluations. 



Since each Skipjack encryption contains 2® = 32 G evaluations, the total 
time complexity of this cryptanalytic attack is equivalent to about one million 
Skipjack encryptions, and can be carried out in seconds on a personal computer. 
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